Yes. This would work.
I had trouble compiling in single precision using the some of the external
package options I was using for double.



On Wed, Mar 28, 2012 at 4:57 PM, Matthew Knepley <knepley at gmail.com> wrote:

> On Wed, Mar 28, 2012 at 4:12 PM, David Fuentes <fuentesdt at gmail.com>wrote:
>
>> works!
>>
>
> Excellent. Now, my thinking was that GPUs are most useful doing single
> work, but
> I can see the utility of double accuracy for a residual.
>
> My inclination is to define another type, say GPUReal, and use it for all
> kernels.
> Would that do what you want?
>
>    Matt
>
>
>> SCRGP2$ make ex52
>> /usr/bin/mpicxx -o ex52.o -c -O0 -g -fPIC
>> -I/opt/apps/PETSC/petsc-dev/include
>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/include
>> -I/opt/apps/cuda/4.0/cuda/include -I/usr/include -I/usr/include/mpich2
>> -D__INSDIR__=src/snes/examples/tutorials/ ex52.c
>> nvcc -G -O0 -g -arch=sm_10  -c --compiler-options="-O0 -g   -fPIC
>>  -I/opt/apps/PETSC/petsc-dev/include
>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/include
>> -I/opt/apps/cuda/4.0/cuda/include -I/usr/include -I/usr/include/mpich2
>>  -D__INSDIR__=src/snes/examples/tutorials/"  ex52_integrateElement.cu
>>  ex52_gpu_inline.h(7): warning: variable "points_0" was declared but
>> never referenced
>>
>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but never
>> referenced
>>
>> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never
>> referenced
>>
>> ex52_gpu_inline.h(13): warning: variable "weights_0" was declared but
>> never referenced
>>
>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but never
>> referenced
>>
>> ex52_gpu_inline.h(28): warning: variable "BasisDerivatives_0" was
>> declared but never referenced
>>
>> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never
>> referenced
>>
>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but never
>> referenced
>>
>> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never
>> referenced
>>
>> ex52_gpu_inline.h(13): warning: variable "weights_0" was declared but
>> never referenced
>>
>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but never
>> referenced
>>
>> ex52_gpu_inline.h(28): warning: variable "BasisDerivatives_0" was
>> declared but never referenced
>>
>> /usr/bin/mpicxx -O0 -g   -o ex52 ex52.o ex52_integrateElement.o
>>  
>> -Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/lib
>> -L/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/lib
>>  -lpetsc
>> -Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/lib
>> -ltriangle -lX11 -lpthread -lmetis -Wl,-rpath,/opt/apps/cuda/4.0/cuda/lib64
>> -L/opt/apps/cuda/4.0/cuda/lib64 -lcufft -lcublas -lcudart -lcusparse
>> -Wl,-rpath,/opt/epd-7.1-2-rh5-x86_64/lib -L/opt/epd-7.1-2-rh5-x86_64/lib
>> -lmkl_rt -lmkl_intel_thread -lmkl_core -liomp5
>> -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.4.3
>> -L/usr/lib/gcc/x86_64-linux-gnu/4.4.3 -ldl -lmpich -lopa -lpthread -lrt
>> -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx
>> -lstdc++ -ldl -lmpich -lopa -lpthread -lrt -lgcc_s -ldl
>>
>>
>> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch -gpu
>> GPU layout grid(1,2,1) block(3,1,1) with 1 batches
>>  N_t: 3, N_cb: 1
>> Residual:
>> Vector Object: 1 MPI processes
>>   type: seq
>> -0.25
>> -0.5
>> 0.25
>> -0.5
>> -1
>> 0.5
>> 0.25
>> 0.5
>> 0.75
>> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch
>> Residual:
>> Vector Object: 1 MPI processes
>>   type: seq
>> -0.25
>> -0.5
>> 0.25
>> -0.5
>> -1
>> 0.5
>> 0.25
>> 0.5
>> 0.75
>>
>>
>>
>>
>>
>> On Wed, Mar 28, 2012 at 1:37 PM, David Fuentes <fuentesdt at gmail.com>wrote:
>>
>>> sure. will do.
>>>
>>>
>>> On Wed, Mar 28, 2012 at 1:23 PM, Matthew Knepley <knepley at 
>>> gmail.com>wrote:
>>>
>>>> On Wed, Mar 28, 2012 at 1:14 PM, David Fuentes <fuentesdt at 
>>>> gmail.com>wrote:
>>>>
>>>>> thanks! its running, but I seem to be getting different answer for
>>>>> cpu/gpu ?
>>>>> i had some floating point problems on this Tesla M2070 gpu before, but
>>>>> adding the '-arch=sm_20' option seemed to fix it last time.
>>>>>
>>>>>
>>>>> is the assembly in single precision ? my 'const PetscReal
>>>>> jacobianInverse' being passed in are doubles
>>>>>
>>>>
>>>> Yep, that is the problem. I have not tested anything in double. I have
>>>> not decided exactly how to handle it. Can you
>>>> make another ARCH --with-precision=single and make sure it works, and
>>>> then we can fix the double issue?
>>>>
>>>>   Thanks,
>>>>
>>>>      Matt
>>>>
>>>>
>>>>> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch -gpu
>>>>> GPU layout grid(1,2,1) block(3,1,1) with 1 batches
>>>>>  N_t: 3, N_cb: 1
>>>>> Residual:
>>>>> Vector Object: 1 MPI processes
>>>>>   type: seq
>>>>> 0
>>>>> 755712
>>>>> 0
>>>>> -58720
>>>>> -2953.13
>>>>> 0.375
>>>>> 1.50323e+07
>>>>> 0.875
>>>>> 0
>>>>> SCRGP2$
>>>>> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch
>>>>> Residual:
>>>>> Vector Object: 1 MPI processes
>>>>>   type: seq
>>>>> -0.25
>>>>> -0.5
>>>>> 0.25
>>>>> -0.5
>>>>> -1
>>>>> 0.5
>>>>> 0.25
>>>>> 0.5
>>>>> 0.75
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 28, 2012 at 11:55 AM, Matthew Knepley <knepley at 
>>>>> gmail.com>wrote:
>>>>>
>>>>>> On Wed, Mar 28, 2012 at 11:45 AM, David Fuentes <fuentesdt at 
>>>>>> gmail.com>wrote:
>>>>>>
>>>>>>> The example seems to be running on cpu with '-batch' but i'm getting
>>>>>>> errors in line 323 with the '-gpu' option
>>>>>>>
>>>>>>> [0]PETSC ERROR: IntegrateElementBatchGPU() line 323 in
>>>>>>> src/snes/examples/tutorials/ex52_integrateElement.cu
>>>>>>>
>>>>>>> should this possibly be PetscScalar ?
>>>>>>>
>>>>>>
>>>>>> No.
>>>>>>
>>>>>>
>>>>>>> -  ierr = cudaMalloc((void**) &d_coefficients,         Ne*N_bt *
>>>>>>> sizeof(float));CHKERRQ(ierr);
>>>>>>> +  ierr = cudaMalloc((void**) &d_coefficients,         Ne*N_bt *
>>>>>>> sizeof(PetscScalar));CHKERRQ(ierr);
>>>>>>>
>>>>>>>
>>>>>>> SCRGP2$ python
>>>>>>> $PETSC_DIR/bin/pythonscripts/PetscGenerateFEMQuadrature.py 2 1 1 1
>>>>>>> laplacian ex52.h
>>>>>>> ['/opt/apps/PETSC/petsc-dev/bin/pythonscripts/PetscGenerateFEMQuadrature.py',
>>>>>>> '2', '1', '1', '1', 'laplacian', 'ex52.h']
>>>>>>> 2 1 1 1 laplacian
>>>>>>> [{(-1.0, -1.0): [(1.0, ())]}, {(1.0, -1.0): [(1.0, ())]}, {(-1.0,
>>>>>>> 1.0): [(1.0, ())]}]
>>>>>>> {0: {0: [0], 1: [1], 2: [2]}, 1: {0: [], 1: [], 2: []}, 2: {0: []}}
>>>>>>> Perm: [0, 1, 2]
>>>>>>> Creating /home/fuentes/snestutorial/ex52.h
>>>>>>> Creating /home/fuentes/snestutorial/ex52_gpu.h
>>>>>>> [{(-1.0, -1.0): [(1.0, ())]}, {(1.0, -1.0): [(1.0, ())]}, {(-1.0,
>>>>>>> 1.0): [(1.0, ())]}]
>>>>>>> {0: {0: [0], 1: [1], 2: [2]}, 1: {0: [], 1: [], 2: []}, 2: {0: []}}
>>>>>>> Perm: [0, 1, 2]
>>>>>>> Creating /home/fuentes/snestutorial/ex52_gpu_inline.h
>>>>>>>
>>>>>>> SCRGP2$ make ex52
>>>>>>> /usr/bin/mpicxx -o ex52.o -c -O0 -g -fPIC
>>>>>>> -I/opt/apps/PETSC/petsc-dev/include
>>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/include
>>>>>>> -I/opt/apps/cuda/4.1/cuda/include 
>>>>>>> -I/opt/apps/PETSC/petsc-dev/include/sieve
>>>>>>> -I/opt/MATLAB/R2011a/extern/include -I/usr/include
>>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/cbind/include
>>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/forbind/include
>>>>>>> -I/usr/include/mpich2 -D__INSDIR__=src/snes/examples/tutorials/ ex52.c
>>>>>>> nvcc -O0 -g -arch=sm_20  -c --compiler-options="-O0 -g   -fPIC
>>>>>>>  -I/opt/apps/PETSC/petsc-dev/include
>>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/include
>>>>>>> -I/opt/apps/cuda/4.1/cuda/include 
>>>>>>> -I/opt/apps/PETSC/petsc-dev/include/sieve
>>>>>>> -I/opt/MATLAB/R2011a/extern/include -I/usr/include
>>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/cbind/include
>>>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/forbind/include
>>>>>>> -I/usr/include/mpich2    -D__INSDIR__=src/snes/examples/tutorials/"
>>>>>>>  ex52_integrateElement.cu
>>>>>>> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but
>>>>>>> never referenced
>>>>>>>
>>>>>>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but
>>>>>>> never referenced
>>>>>>>
>>>>>>> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but
>>>>>>> never referenced
>>>>>>>
>>>>>>> ex52_gpu_inline.h(13): warning: variable "weights_0" was declared
>>>>>>> but never referenced
>>>>>>>
>>>>>>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but
>>>>>>> never referenced
>>>>>>>
>>>>>>> ex52_gpu_inline.h(28): warning: variable "BasisDerivatives_0" was
>>>>>>> declared but never referenced
>>>>>>>
>>>>>>> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but
>>>>>>> never referenced
>>>>>>>
>>>>>>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but
>>>>>>> never referenced
>>>>>>>
>>>>>>> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but
>>>>>>> never referenced
>>>>>>>
>>>>>>> ex52_gpu_inline.h(13): warning: variable "weights_0" was declared
>>>>>>> but never referenced
>>>>>>>
>>>>>>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but
>>>>>>> never referenced
>>>>>>>
>>>>>>> ex52_gpu_inline.h(28): warning: variable "BasisDerivatives_0" was
>>>>>>> declared but never referenced
>>>>>>>
>>>>>>> /usr/bin/mpicxx -O0 -g   -o ex52 ex52.o ex52_integrateElement.o
>>>>>>>  
>>>>>>> -Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/lib
>>>>>>> -L/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/lib  
>>>>>>> -lpetsc
>>>>>>> -Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/lib
>>>>>>> -ltriangle -lX11 -lpthread -lsuperlu_dist_3.0 -lcmumps -ldmumps -lsmumps
>>>>>>> -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs
>>>>>>> -Wl,-rpath,/opt/apps/cuda/4.1/cuda/lib64 -L/opt/apps/cuda/4.1/cuda/lib64
>>>>>>> -lcufft -lcublas -lcudart -lcusparse
>>>>>>> -Wl,-rpath,/opt/MATLAB/R2011a/sys/os/glnxa64:/opt/MATLAB/R2011a/bin/glnxa64:/opt/MATLAB/R2011a/extern/lib/glnxa64
>>>>>>> -L/opt/MATLAB/R2011a/bin/glnxa64 -L/opt/MATLAB/R2011a/extern/lib/glnxa64
>>>>>>> -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc
>>>>>>> -Wl,-rpath,/opt/epd-7.1-2-rh5-x86_64/lib -L/opt/epd-7.1-2-rh5-x86_64/lib
>>>>>>> -lmkl_rt -lmkl_intel_thread -lmkl_core -liomp5 -lexoIIv2for -lexodus
>>>>>>> -lnetcdf_c++ -lnetcdf -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.4.3
>>>>>>> -L/usr/lib/gcc/x86_64-linux-gnu/4.4.3 -ldl -lmpich -lopa -lpthread -lrt
>>>>>>> -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx
>>>>>>> -lstdc++ -ldl -lmpich -lopa -lpthread -lrt -lgcc_s -ldl
>>>>>>> /bin/rm -f ex52.o ex52_integrateElement.o
>>>>>>>
>>>>>>>
>>>>>>> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch
>>>>>>> Residual:
>>>>>>> Vector Object: 1 MPI processes
>>>>>>>   type: seq
>>>>>>> -0.25
>>>>>>> -0.5
>>>>>>> 0.25
>>>>>>> -0.5
>>>>>>> -1
>>>>>>> 0.5
>>>>>>> 0.25
>>>>>>> 0.5
>>>>>>> 0.75
>>>>>>> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch -gpu
>>>>>>> [0]PETSC ERROR: IntegrateElementBatchGPU() line 323 in
>>>>>>> src/snes/examples/tutorials/ex52_integrateElement.cu
>>>>>>> [0]PETSC ERROR: FormFunctionLocalBatch() line 679 in
>>>>>>> src/snes/examples/tutorials/ex52.c
>>>>>>> [0]PETSC ERROR: SNESDMComplexComputeFunction() line 431 in
>>>>>>> src/snes/utils/damgsnes.c
>>>>>>> [0]PETSC ERROR: main() line 1021 in
>>>>>>> src/snes/examples/tutorials/ex52.c
>>>>>>> application called MPI_Abort(MPI_COMM_WORLD, 35) - process 0
>>>>>>>
>>>>>>
>>>>>> This is failing on cudaMalloc(), which means your card is not
>>>>>> available for running. Are you trying to run on your laptop?
>>>>>> If so, applications like Preview can lock up the GPU. I know of no
>>>>>> way to test this in CUDA while running. I just close
>>>>>> apps until it runs.
>>>>>>
>>>>>>   Thanks,
>>>>>>
>>>>>>      Matt
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> On Tue, Mar 27, 2012 at 8:37 PM, Matthew Knepley <knepley at 
>>>>>>> gmail.com>wrote:
>>>>>>>
>>>>>>>> On Tue, Mar 27, 2012 at 2:10 PM, Blaise Bourdin <bourdin at 
>>>>>>>> lsu.edu>wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mar 27, 2012, at 1:23 PM, Matthew Knepley wrote:
>>>>>>>>>
>>>>>>>>> On Tue, Mar 27, 2012 at 12:58 PM, David Fuentes <
>>>>>>>>> fuentesdt at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I had a question about the status of example 52.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://petsc.cs.iit.edu/petsc/petsc-dev/file/a8e2f2c19319/src/snes/examples/tutorials/ex52.c
>>>>>>>>>>
>>>>>>>>>> http://petsc.cs.iit.edu/petsc/petsc-dev/file/a8e2f2c19319/src/snes/examples/tutorials/ex52_integrateElement.cu
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Can this example be used with a DM object created from an
>>>>>>>>>> unstructured exodusII mesh, DMMeshCreateExodus, And the FEM assembly 
>>>>>>>>>> done
>>>>>>>>>> on GPU ?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 1) I have pushed many more tests for it now. They can be run using
>>>>>>>>> the Python build system
>>>>>>>>>
>>>>>>>>>   ./config/builder2.py check src/snes/examples/tutorials/ex52.c
>>>>>>>>>
>>>>>>>>>   in fact, you can build any set of files this way.
>>>>>>>>>
>>>>>>>>> 2) The Exodus creation has to be converted to DMComplex from
>>>>>>>>> DMMesh. That should not take me very long. Blaise maintains that
>>>>>>>>>      so maybe there will be help :) You will just replace
>>>>>>>>> DMComplexCreateBoxMesh() with DMComplexCreateExodus(). If you request
>>>>>>>>>      it, I will bump it up the list.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> DMMeshCreateExodusNG is much more flexible than DMMeshCreateExodus
>>>>>>>>> in that it can read meshes with multiple element types and should 
>>>>>>>>> have a
>>>>>>>>> much lower memory footprint. The code should be fairly easy to read. 
>>>>>>>>> you
>>>>>>>>> can email me directly if you have specific questions. I had looked at
>>>>>>>>> creating a DMComplex and it did not look too difficult, as long as
>>>>>>>>> interpolation is not needed. I have plans to write 
>>>>>>>>> DMComplexCreateExodus,
>>>>>>>>> but haven't had time too so far. Updating the Vec viewers and readers 
>>>>>>>>> may
>>>>>>>>> be a bit more involved. In perfect world, one would write an EXODUS 
>>>>>>>>> viewer
>>>>>>>>> following the lines of the VTK and HDF5 ones.
>>>>>>>>>
>>>>>>>>
>>>>>>>> David and Blaise,  I have converted this function, now
>>>>>>>> DMComplexCreateExodus(). Its not tested, but I think
>>>>>>>> Blaise has some stuff we can use to test it.
>>>>>>>>
>>>>>>>>   Thanks,
>>>>>>>>
>>>>>>>>      Matt
>>>>>>>>
>>>>>>>>
>>>>>>>>> Blaise
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Let me know if you can run the tests.
>>>>>>>>>
>>>>>>>>>   Thanks
>>>>>>>>>
>>>>>>>>>      Matt
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>>> experiments is infinitely more interesting than any results to which 
>>>>>>>>> their
>>>>>>>>> experiments lead.
>>>>>>>>> -- Norbert Wiener
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  --
>>>>>>>>> Department of Mathematics and Center for Computation & Technology
>>>>>>>>> Louisiana State University, Baton Rouge, LA 70803, USA
>>>>>>>>> Tel. +1 (225) 578 1612, Fax  +1 (225) 578 4276
>>>>>>>>> http://www.math.lsu.edu/~bourdin
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>> experiments is infinitely more interesting than any results to which 
>>>>>>>> their
>>>>>>>> experiments lead.
>>>>>>>> -- Norbert Wiener
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their
>>>>>> experiments is infinitely more interesting than any results to which 
>>>>>> their
>>>>>> experiments lead.
>>>>>> -- Norbert Wiener
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>
>>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120328/2ca30beb/attachment-0001.htm>

Reply via email to