Yes. This would work. I had trouble compiling in single precision using the some of the external package options I was using for double.
On Wed, Mar 28, 2012 at 4:57 PM, Matthew Knepley <knepley at gmail.com> wrote: > On Wed, Mar 28, 2012 at 4:12 PM, David Fuentes <fuentesdt at gmail.com>wrote: > >> works! >> > > Excellent. Now, my thinking was that GPUs are most useful doing single > work, but > I can see the utility of double accuracy for a residual. > > My inclination is to define another type, say GPUReal, and use it for all > kernels. > Would that do what you want? > > Matt > > >> SCRGP2$ make ex52 >> /usr/bin/mpicxx -o ex52.o -c -O0 -g -fPIC >> -I/opt/apps/PETSC/petsc-dev/include >> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/include >> -I/opt/apps/cuda/4.0/cuda/include -I/usr/include -I/usr/include/mpich2 >> -D__INSDIR__=src/snes/examples/tutorials/ ex52.c >> nvcc -G -O0 -g -arch=sm_10 -c --compiler-options="-O0 -g -fPIC >> -I/opt/apps/PETSC/petsc-dev/include >> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/include >> -I/opt/apps/cuda/4.0/cuda/include -I/usr/include -I/usr/include/mpich2 >> -D__INSDIR__=src/snes/examples/tutorials/" ex52_integrateElement.cu >> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but >> never referenced >> >> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but never >> referenced >> >> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never >> referenced >> >> ex52_gpu_inline.h(13): warning: variable "weights_0" was declared but >> never referenced >> >> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but never >> referenced >> >> ex52_gpu_inline.h(28): warning: variable "BasisDerivatives_0" was >> declared but never referenced >> >> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never >> referenced >> >> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but never >> referenced >> >> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never >> referenced >> >> ex52_gpu_inline.h(13): warning: variable "weights_0" was declared but >> never referenced >> >> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but never >> referenced >> >> ex52_gpu_inline.h(28): warning: variable "BasisDerivatives_0" was >> declared but never referenced >> >> /usr/bin/mpicxx -O0 -g -o ex52 ex52.o ex52_integrateElement.o >> >> -Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/lib >> -L/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/lib >> -lpetsc >> -Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/lib >> -ltriangle -lX11 -lpthread -lmetis -Wl,-rpath,/opt/apps/cuda/4.0/cuda/lib64 >> -L/opt/apps/cuda/4.0/cuda/lib64 -lcufft -lcublas -lcudart -lcusparse >> -Wl,-rpath,/opt/epd-7.1-2-rh5-x86_64/lib -L/opt/epd-7.1-2-rh5-x86_64/lib >> -lmkl_rt -lmkl_intel_thread -lmkl_core -liomp5 >> -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.4.3 >> -L/usr/lib/gcc/x86_64-linux-gnu/4.4.3 -ldl -lmpich -lopa -lpthread -lrt >> -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx >> -lstdc++ -ldl -lmpich -lopa -lpthread -lrt -lgcc_s -ldl >> >> >> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch -gpu >> GPU layout grid(1,2,1) block(3,1,1) with 1 batches >> N_t: 3, N_cb: 1 >> Residual: >> Vector Object: 1 MPI processes >> type: seq >> -0.25 >> -0.5 >> 0.25 >> -0.5 >> -1 >> 0.5 >> 0.25 >> 0.5 >> 0.75 >> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch >> Residual: >> Vector Object: 1 MPI processes >> type: seq >> -0.25 >> -0.5 >> 0.25 >> -0.5 >> -1 >> 0.5 >> 0.25 >> 0.5 >> 0.75 >> >> >> >> >> >> On Wed, Mar 28, 2012 at 1:37 PM, David Fuentes <fuentesdt at gmail.com>wrote: >> >>> sure. will do. >>> >>> >>> On Wed, Mar 28, 2012 at 1:23 PM, Matthew Knepley <knepley at >>> gmail.com>wrote: >>> >>>> On Wed, Mar 28, 2012 at 1:14 PM, David Fuentes <fuentesdt at >>>> gmail.com>wrote: >>>> >>>>> thanks! its running, but I seem to be getting different answer for >>>>> cpu/gpu ? >>>>> i had some floating point problems on this Tesla M2070 gpu before, but >>>>> adding the '-arch=sm_20' option seemed to fix it last time. >>>>> >>>>> >>>>> is the assembly in single precision ? my 'const PetscReal >>>>> jacobianInverse' being passed in are doubles >>>>> >>>> >>>> Yep, that is the problem. I have not tested anything in double. I have >>>> not decided exactly how to handle it. Can you >>>> make another ARCH --with-precision=single and make sure it works, and >>>> then we can fix the double issue? >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch -gpu >>>>> GPU layout grid(1,2,1) block(3,1,1) with 1 batches >>>>> N_t: 3, N_cb: 1 >>>>> Residual: >>>>> Vector Object: 1 MPI processes >>>>> type: seq >>>>> 0 >>>>> 755712 >>>>> 0 >>>>> -58720 >>>>> -2953.13 >>>>> 0.375 >>>>> 1.50323e+07 >>>>> 0.875 >>>>> 0 >>>>> SCRGP2$ >>>>> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch >>>>> Residual: >>>>> Vector Object: 1 MPI processes >>>>> type: seq >>>>> -0.25 >>>>> -0.5 >>>>> 0.25 >>>>> -0.5 >>>>> -1 >>>>> 0.5 >>>>> 0.25 >>>>> 0.5 >>>>> 0.75 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Mar 28, 2012 at 11:55 AM, Matthew Knepley <knepley at >>>>> gmail.com>wrote: >>>>> >>>>>> On Wed, Mar 28, 2012 at 11:45 AM, David Fuentes <fuentesdt at >>>>>> gmail.com>wrote: >>>>>> >>>>>>> The example seems to be running on cpu with '-batch' but i'm getting >>>>>>> errors in line 323 with the '-gpu' option >>>>>>> >>>>>>> [0]PETSC ERROR: IntegrateElementBatchGPU() line 323 in >>>>>>> src/snes/examples/tutorials/ex52_integrateElement.cu >>>>>>> >>>>>>> should this possibly be PetscScalar ? >>>>>>> >>>>>> >>>>>> No. >>>>>> >>>>>> >>>>>>> - ierr = cudaMalloc((void**) &d_coefficients, Ne*N_bt * >>>>>>> sizeof(float));CHKERRQ(ierr); >>>>>>> + ierr = cudaMalloc((void**) &d_coefficients, Ne*N_bt * >>>>>>> sizeof(PetscScalar));CHKERRQ(ierr); >>>>>>> >>>>>>> >>>>>>> SCRGP2$ python >>>>>>> $PETSC_DIR/bin/pythonscripts/PetscGenerateFEMQuadrature.py 2 1 1 1 >>>>>>> laplacian ex52.h >>>>>>> ['/opt/apps/PETSC/petsc-dev/bin/pythonscripts/PetscGenerateFEMQuadrature.py', >>>>>>> '2', '1', '1', '1', 'laplacian', 'ex52.h'] >>>>>>> 2 1 1 1 laplacian >>>>>>> [{(-1.0, -1.0): [(1.0, ())]}, {(1.0, -1.0): [(1.0, ())]}, {(-1.0, >>>>>>> 1.0): [(1.0, ())]}] >>>>>>> {0: {0: [0], 1: [1], 2: [2]}, 1: {0: [], 1: [], 2: []}, 2: {0: []}} >>>>>>> Perm: [0, 1, 2] >>>>>>> Creating /home/fuentes/snestutorial/ex52.h >>>>>>> Creating /home/fuentes/snestutorial/ex52_gpu.h >>>>>>> [{(-1.0, -1.0): [(1.0, ())]}, {(1.0, -1.0): [(1.0, ())]}, {(-1.0, >>>>>>> 1.0): [(1.0, ())]}] >>>>>>> {0: {0: [0], 1: [1], 2: [2]}, 1: {0: [], 1: [], 2: []}, 2: {0: []}} >>>>>>> Perm: [0, 1, 2] >>>>>>> Creating /home/fuentes/snestutorial/ex52_gpu_inline.h >>>>>>> >>>>>>> SCRGP2$ make ex52 >>>>>>> /usr/bin/mpicxx -o ex52.o -c -O0 -g -fPIC >>>>>>> -I/opt/apps/PETSC/petsc-dev/include >>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/include >>>>>>> -I/opt/apps/cuda/4.1/cuda/include >>>>>>> -I/opt/apps/PETSC/petsc-dev/include/sieve >>>>>>> -I/opt/MATLAB/R2011a/extern/include -I/usr/include >>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/cbind/include >>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/forbind/include >>>>>>> -I/usr/include/mpich2 -D__INSDIR__=src/snes/examples/tutorials/ ex52.c >>>>>>> nvcc -O0 -g -arch=sm_20 -c --compiler-options="-O0 -g -fPIC >>>>>>> -I/opt/apps/PETSC/petsc-dev/include >>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/include >>>>>>> -I/opt/apps/cuda/4.1/cuda/include >>>>>>> -I/opt/apps/PETSC/petsc-dev/include/sieve >>>>>>> -I/opt/MATLAB/R2011a/extern/include -I/usr/include >>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/cbind/include >>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/forbind/include >>>>>>> -I/usr/include/mpich2 -D__INSDIR__=src/snes/examples/tutorials/" >>>>>>> ex52_integrateElement.cu >>>>>>> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but >>>>>>> never referenced >>>>>>> >>>>>>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but >>>>>>> never referenced >>>>>>> >>>>>>> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but >>>>>>> never referenced >>>>>>> >>>>>>> ex52_gpu_inline.h(13): warning: variable "weights_0" was declared >>>>>>> but never referenced >>>>>>> >>>>>>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but >>>>>>> never referenced >>>>>>> >>>>>>> ex52_gpu_inline.h(28): warning: variable "BasisDerivatives_0" was >>>>>>> declared but never referenced >>>>>>> >>>>>>> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but >>>>>>> never referenced >>>>>>> >>>>>>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but >>>>>>> never referenced >>>>>>> >>>>>>> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but >>>>>>> never referenced >>>>>>> >>>>>>> ex52_gpu_inline.h(13): warning: variable "weights_0" was declared >>>>>>> but never referenced >>>>>>> >>>>>>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but >>>>>>> never referenced >>>>>>> >>>>>>> ex52_gpu_inline.h(28): warning: variable "BasisDerivatives_0" was >>>>>>> declared but never referenced >>>>>>> >>>>>>> /usr/bin/mpicxx -O0 -g -o ex52 ex52.o ex52_integrateElement.o >>>>>>> >>>>>>> -Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/lib >>>>>>> -L/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/lib >>>>>>> -lpetsc >>>>>>> -Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/lib >>>>>>> -ltriangle -lX11 -lpthread -lsuperlu_dist_3.0 -lcmumps -ldmumps -lsmumps >>>>>>> -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs >>>>>>> -Wl,-rpath,/opt/apps/cuda/4.1/cuda/lib64 -L/opt/apps/cuda/4.1/cuda/lib64 >>>>>>> -lcufft -lcublas -lcudart -lcusparse >>>>>>> -Wl,-rpath,/opt/MATLAB/R2011a/sys/os/glnxa64:/opt/MATLAB/R2011a/bin/glnxa64:/opt/MATLAB/R2011a/extern/lib/glnxa64 >>>>>>> -L/opt/MATLAB/R2011a/bin/glnxa64 -L/opt/MATLAB/R2011a/extern/lib/glnxa64 >>>>>>> -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc >>>>>>> -Wl,-rpath,/opt/epd-7.1-2-rh5-x86_64/lib -L/opt/epd-7.1-2-rh5-x86_64/lib >>>>>>> -lmkl_rt -lmkl_intel_thread -lmkl_core -liomp5 -lexoIIv2for -lexodus >>>>>>> -lnetcdf_c++ -lnetcdf -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.4.3 >>>>>>> -L/usr/lib/gcc/x86_64-linux-gnu/4.4.3 -ldl -lmpich -lopa -lpthread -lrt >>>>>>> -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx >>>>>>> -lstdc++ -ldl -lmpich -lopa -lpthread -lrt -lgcc_s -ldl >>>>>>> /bin/rm -f ex52.o ex52_integrateElement.o >>>>>>> >>>>>>> >>>>>>> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch >>>>>>> Residual: >>>>>>> Vector Object: 1 MPI processes >>>>>>> type: seq >>>>>>> -0.25 >>>>>>> -0.5 >>>>>>> 0.25 >>>>>>> -0.5 >>>>>>> -1 >>>>>>> 0.5 >>>>>>> 0.25 >>>>>>> 0.5 >>>>>>> 0.75 >>>>>>> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch -gpu >>>>>>> [0]PETSC ERROR: IntegrateElementBatchGPU() line 323 in >>>>>>> src/snes/examples/tutorials/ex52_integrateElement.cu >>>>>>> [0]PETSC ERROR: FormFunctionLocalBatch() line 679 in >>>>>>> src/snes/examples/tutorials/ex52.c >>>>>>> [0]PETSC ERROR: SNESDMComplexComputeFunction() line 431 in >>>>>>> src/snes/utils/damgsnes.c >>>>>>> [0]PETSC ERROR: main() line 1021 in >>>>>>> src/snes/examples/tutorials/ex52.c >>>>>>> application called MPI_Abort(MPI_COMM_WORLD, 35) - process 0 >>>>>>> >>>>>> >>>>>> This is failing on cudaMalloc(), which means your card is not >>>>>> available for running. Are you trying to run on your laptop? >>>>>> If so, applications like Preview can lock up the GPU. I know of no >>>>>> way to test this in CUDA while running. I just close >>>>>> apps until it runs. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>>> >>>>>>> On Tue, Mar 27, 2012 at 8:37 PM, Matthew Knepley <knepley at >>>>>>> gmail.com>wrote: >>>>>>> >>>>>>>> On Tue, Mar 27, 2012 at 2:10 PM, Blaise Bourdin <bourdin at >>>>>>>> lsu.edu>wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> On Mar 27, 2012, at 1:23 PM, Matthew Knepley wrote: >>>>>>>>> >>>>>>>>> On Tue, Mar 27, 2012 at 12:58 PM, David Fuentes < >>>>>>>>> fuentesdt at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I had a question about the status of example 52. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> http://petsc.cs.iit.edu/petsc/petsc-dev/file/a8e2f2c19319/src/snes/examples/tutorials/ex52.c >>>>>>>>>> >>>>>>>>>> http://petsc.cs.iit.edu/petsc/petsc-dev/file/a8e2f2c19319/src/snes/examples/tutorials/ex52_integrateElement.cu >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Can this example be used with a DM object created from an >>>>>>>>>> unstructured exodusII mesh, DMMeshCreateExodus, And the FEM assembly >>>>>>>>>> done >>>>>>>>>> on GPU ? >>>>>>>>>> >>>>>>>>> >>>>>>>>> 1) I have pushed many more tests for it now. They can be run using >>>>>>>>> the Python build system >>>>>>>>> >>>>>>>>> ./config/builder2.py check src/snes/examples/tutorials/ex52.c >>>>>>>>> >>>>>>>>> in fact, you can build any set of files this way. >>>>>>>>> >>>>>>>>> 2) The Exodus creation has to be converted to DMComplex from >>>>>>>>> DMMesh. That should not take me very long. Blaise maintains that >>>>>>>>> so maybe there will be help :) You will just replace >>>>>>>>> DMComplexCreateBoxMesh() with DMComplexCreateExodus(). If you request >>>>>>>>> it, I will bump it up the list. >>>>>>>>> >>>>>>>>> >>>>>>>>> DMMeshCreateExodusNG is much more flexible than DMMeshCreateExodus >>>>>>>>> in that it can read meshes with multiple element types and should >>>>>>>>> have a >>>>>>>>> much lower memory footprint. The code should be fairly easy to read. >>>>>>>>> you >>>>>>>>> can email me directly if you have specific questions. I had looked at >>>>>>>>> creating a DMComplex and it did not look too difficult, as long as >>>>>>>>> interpolation is not needed. I have plans to write >>>>>>>>> DMComplexCreateExodus, >>>>>>>>> but haven't had time too so far. Updating the Vec viewers and readers >>>>>>>>> may >>>>>>>>> be a bit more involved. In perfect world, one would write an EXODUS >>>>>>>>> viewer >>>>>>>>> following the lines of the VTK and HDF5 ones. >>>>>>>>> >>>>>>>> >>>>>>>> David and Blaise, I have converted this function, now >>>>>>>> DMComplexCreateExodus(). Its not tested, but I think >>>>>>>> Blaise has some stuff we can use to test it. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>> >>>>>>>>> Blaise >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Let me know if you can run the tests. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> Matt >>>>>>>>> >>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> David >>>>>>>>>> >>>>>>>>> -- >>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>> experiments is infinitely more interesting than any results to which >>>>>>>>> their >>>>>>>>> experiments lead. >>>>>>>>> -- Norbert Wiener >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Department of Mathematics and Center for Computation & Technology >>>>>>>>> Louisiana State University, Baton Rouge, LA 70803, USA >>>>>>>>> Tel. +1 (225) 578 1612, Fax +1 (225) 578 4276 >>>>>>>>> http://www.math.lsu.edu/~bourdin >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their >>>>>>>> experiments is infinitely more interesting than any results to which >>>>>>>> their >>>>>>>> experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> What most experimenters take for granted before they begin their >>>>>> experiments is infinitely more interesting than any results to which >>>>>> their >>>>>> experiments lead. >>>>>> -- Norbert Wiener >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>> >>> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120328/2ca30beb/attachment-0001.htm>