Re: [petsc-dev] cusparse error

2020-12-09 Thread Mark Adams
OK, so I will remove my added init(). I guess you just retry the test.
Thanks,

On Wed, Dec 9, 2020 at 9:03 PM Barry Smith  wrote:

>
>   Yes, these messages are because of lack of resources even though the
> error message says not initialized.
>
>   Barry
>
>
> On Dec 9, 2020, at 8:01 PM, Junchao Zhang  wrote:
>
> Could be GPU resource competition. Note this test uses nsize=8.
> --Junchao Zhang
>
>
> On Wed, Dec 9, 2020 at 7:15 PM Mark Adams  wrote:
>
>> And this is a Cuda 11 complex build:
>> https://gitlab.com/petsc/petsc/-/jobs/901108135
>>
>> On Wed, Dec 9, 2020 at 8:11 PM Mark Adams  wrote:
>>
>>> My MR is generating an error. Tee error message says cusparse has not
>>> been initialized, so I added a cuparse init, but I still get the error
>>> (appended, *adams/landau-gpu-assembly
>>> *).
>>> Any ideas would be appreciated.
>>>
>>> I am trying to reproduce this on Summit and it fails with a timeout
>>> limit of 60s, but it only runs for a few seconds (see timers). Any ideas?
>>>
>>> 19:58 adams/landau-gpu-assembly= ~/petsc$ make -f gmakefile test
>>> search='ksp_ksp_tutorials-ex71_bddc_cusparse'
>>> PETSC_ARCH=arch-summit-opt-gnu-cuda
>>> Using MAKEFLAGS: PETSC_ARCH=arch-summit-opt-gnu-cuda
>>> search=ksp_ksp_tutorials-ex71_bddc_cusparse
>>> TEST
>>> arch-summit-opt-gnu-cuda/tests/counts/ksp_ksp_tutorials-ex71_bddc_cusparse.counts
>>> not ok ksp_ksp_tutorials-ex71_bddc_cusparse # Exceeded timeout limit of
>>> 60 s
>>>  ok ksp_ksp_tutorials-ex71_bddc_cusparse # SKIP Command failed so no diff
>>>
>>> # -
>>> #   Summary
>>> # -
>>> # FAILED ksp_ksp_tutorials-ex71_bddc_cusparse
>>> # success 0/1 tests (0.0%)
>>> # failed 1/1 tests (100.0%)
>>> # todo 0/1 tests (0.0%)
>>> # skip 0/1 tests (0.0%)
>>> #
>>> # Wall clock time for tests: 3 sec
>>> # Approximate CPU time (not incl. build time): 3.14 sec
>>>
>>>
>>>
>>>
>>>
>>> not ok ksp_ksp_tutorials-ex71_bddc_cusparse # Error code: 201
>>> 2391 # [1]PETSC
>>> ERROR: - Error Message
>>> --
>>> 2392 # [1]PETSC
>>> ERROR: GPU error
>>> 2393 # [1]PETSC
>>> ERROR: cuSPARSE error 1 (CUSPARSE_STATUS_NOT_INITIALIZED) : initialization
>>> error
>>> 2394 # [1]PETSC
>>> ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for
>>> trouble shooting.
>>> 2395 # [1]PETSC
>>> ERROR: Petsc Development GIT revision: v3.14.2-85-gd60087d GIT Date:
>>> 2020-12-09 17:49:59 -0500
>>> 2396 # [1]PETSC
>>> ERROR: ../ex71 on a named frog by petsc Wed Dec 9 18:41:10 2020
>>> 2397 # [1]PETSC
>>> ERROR: Configure options --package-prefix-hash=/home/petsc/petsc-hash-pkgs
>>> --with-make-test-np=2 COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g
>>> -O" --with-scalar-type=complex --with-precision=single
>>> --with-cuda-dir=/usr/local/cuda-11.0 PETSC_ARCH=arch-ci-linux-cuda11-complex
>>> 2398 # [1]PETSC
>>> ERROR: #1 MatConvert_SeqAIJ_SeqAIJCUSPARSE() line 2708 in
>>> /home/petsc/builds/KFnbdjNX/0/petsc/petsc/src/mat/impls/aij/seq/seqcusparse/
>>> aijcusparse.cu
>>> 2399 # [1]PETSC
>>> ERROR: #2 MatCreate_SeqAIJCUSPARSE() line 2739 in
>>> /home/petsc/builds/KFnbdjNX/0/petsc/petsc/src/mat/impls/aij/seq/seqcusparse/
>>> aijcusparse.cu
>>>
>>
>


Re: [petsc-dev] cusparse error

2020-12-09 Thread Barry Smith
 
  Yes, these messages are because of lack of resources even though the error 
message says not initialized.

  Barry


> On Dec 9, 2020, at 8:01 PM, Junchao Zhang  wrote:
> 
> Could be GPU resource competition. Note this test uses nsize=8.
> --Junchao Zhang
> 
> 
> On Wed, Dec 9, 2020 at 7:15 PM Mark Adams  > wrote:
> And this is a Cuda 11 complex build: 
> https://gitlab.com/petsc/petsc/-/jobs/901108135 
> 
> On Wed, Dec 9, 2020 at 8:11 PM Mark Adams  > wrote:
> My MR is generating an error. Tee error message says cusparse has not been 
> initialized, so I added a cuparse init, but I still get the error (appended, 
> adams/landau-gpu-assembly 
> ).  Any 
> ideas would be appreciated.
> 
> I am trying to reproduce this on Summit and it fails with a timeout limit of 
> 60s, but it only runs for a few seconds (see timers). Any ideas?
> 
> 19:58 adams/landau-gpu-assembly= ~/petsc$ make -f gmakefile test 
> search='ksp_ksp_tutorials-ex71_bddc_cusparse' 
> PETSC_ARCH=arch-summit-opt-gnu-cuda
> Using MAKEFLAGS: PETSC_ARCH=arch-summit-opt-gnu-cuda 
> search=ksp_ksp_tutorials-ex71_bddc_cusparse
> TEST 
> arch-summit-opt-gnu-cuda/tests/counts/ksp_ksp_tutorials-ex71_bddc_cusparse.counts
> not ok ksp_ksp_tutorials-ex71_bddc_cusparse # Exceeded timeout limit of 60 s
>  ok ksp_ksp_tutorials-ex71_bddc_cusparse # SKIP Command failed so no diff
> 
> # -
> #   Summary
> # -
> # FAILED ksp_ksp_tutorials-ex71_bddc_cusparse
> # success 0/1 tests (0.0%)
> # failed 1/1 tests (100.0%)
> # todo 0/1 tests (0.0%)
> # skip 0/1 tests (0.0%)
> #
> # Wall clock time for tests: 3 sec
> # Approximate CPU time (not incl. build time): 3.14 sec
> 
> 
> 
> 
> 
> not ok ksp_ksp_tutorials-ex71_bddc_cusparse # Error code: 201
> 2391 # [1]PETSC ERROR: 
> - Error Message 
> --
> 2392 # [1]PETSC ERROR: 
> GPU error 
> 2393 # [1]PETSC ERROR: 
> cuSPARSE error 1 (CUSPARSE_STATUS_NOT_INITIALIZED) : initialization error
> 2394 # [1]PETSC ERROR: 
> See https://www.mcs.anl.gov/petsc/documentation/faq.html 
>  for trouble shooting.
> 2395 # [1]PETSC ERROR: 
> Petsc Development GIT revision: v3.14.2-85-gd60087d  GIT Date: 2020-12-09 
> 17:49:59 -0500
> 2396 # [1]PETSC ERROR: 
> ../ex71 on a  named frog by petsc Wed Dec  9 18:41:10 2020
> 2397 # [1]PETSC ERROR: 
> Configure options --package-prefix-hash=/home/petsc/petsc-hash-pkgs 
> --with-make-test-np=2 COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" 
> --with-scalar-type=complex --with-precision=single 
> --with-cuda-dir=/usr/local/cuda-11.0 PETSC_ARCH=arch-ci-linux-cuda11-complex
> 2398 # [1]PETSC ERROR: 
> #1 MatConvert_SeqAIJ_SeqAIJCUSPARSE() line 2708 in 
> /home/petsc/builds/KFnbdjNX/0/petsc/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu
>  
> 2399 # [1]PETSC ERROR: 
> #2 MatCreate_SeqAIJCUSPARSE() line 2739 in 
> /home/petsc/builds/KFnbdjNX/0/petsc/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu
>  


Re: [petsc-dev] cusparse error

2020-12-09 Thread Junchao Zhang
Could be GPU resource competition. Note this test uses nsize=8.
--Junchao Zhang


On Wed, Dec 9, 2020 at 7:15 PM Mark Adams  wrote:

> And this is a Cuda 11 complex build:
> https://gitlab.com/petsc/petsc/-/jobs/901108135
>
> On Wed, Dec 9, 2020 at 8:11 PM Mark Adams  wrote:
>
>> My MR is generating an error. Tee error message says cusparse has not
>> been initialized, so I added a cuparse init, but I still get the error
>> (appended, *adams/landau-gpu-assembly
>> *).
>> Any ideas would be appreciated.
>>
>> I am trying to reproduce this on Summit and it fails with a timeout limit
>> of 60s, but it only runs for a few seconds (see timers). Any ideas?
>>
>> 19:58 adams/landau-gpu-assembly= ~/petsc$ make -f gmakefile test
>> search='ksp_ksp_tutorials-ex71_bddc_cusparse'
>> PETSC_ARCH=arch-summit-opt-gnu-cuda
>> Using MAKEFLAGS: PETSC_ARCH=arch-summit-opt-gnu-cuda
>> search=ksp_ksp_tutorials-ex71_bddc_cusparse
>> TEST
>> arch-summit-opt-gnu-cuda/tests/counts/ksp_ksp_tutorials-ex71_bddc_cusparse.counts
>> not ok ksp_ksp_tutorials-ex71_bddc_cusparse # Exceeded timeout limit of
>> 60 s
>>  ok ksp_ksp_tutorials-ex71_bddc_cusparse # SKIP Command failed so no diff
>>
>> # -
>> #   Summary
>> # -
>> # FAILED ksp_ksp_tutorials-ex71_bddc_cusparse
>> # success 0/1 tests (0.0%)
>> # failed 1/1 tests (100.0%)
>> # todo 0/1 tests (0.0%)
>> # skip 0/1 tests (0.0%)
>> #
>> # Wall clock time for tests: 3 sec
>> # Approximate CPU time (not incl. build time): 3.14 sec
>>
>>
>>
>>
>>
>> not ok ksp_ksp_tutorials-ex71_bddc_cusparse # Error code: 201
>> 2391 # [1]PETSC
>> ERROR: - Error Message
>> --
>> 2392 # [1]PETSC
>> ERROR: GPU error
>> 2393 # [1]PETSC
>> ERROR: cuSPARSE error 1 (CUSPARSE_STATUS_NOT_INITIALIZED) : initialization
>> error
>> 2394 # [1]PETSC
>> ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for
>> trouble shooting.
>> 2395 # [1]PETSC
>> ERROR: Petsc Development GIT revision: v3.14.2-85-gd60087d GIT Date:
>> 2020-12-09 17:49:59 -0500
>> 2396 # [1]PETSC
>> ERROR: ../ex71 on a named frog by petsc Wed Dec 9 18:41:10 2020
>> 2397 # [1]PETSC
>> ERROR: Configure options --package-prefix-hash=/home/petsc/petsc-hash-pkgs
>> --with-make-test-np=2 COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g
>> -O" --with-scalar-type=complex --with-precision=single
>> --with-cuda-dir=/usr/local/cuda-11.0 PETSC_ARCH=arch-ci-linux-cuda11-complex
>> 2398 # [1]PETSC
>> ERROR: #1 MatConvert_SeqAIJ_SeqAIJCUSPARSE() line 2708 in
>> /home/petsc/builds/KFnbdjNX/0/petsc/petsc/src/mat/impls/aij/seq/seqcusparse/
>> aijcusparse.cu
>> 2399 # [1]PETSC
>> ERROR: #2 MatCreate_SeqAIJCUSPARSE() line 2739 in
>> /home/petsc/builds/KFnbdjNX/0/petsc/petsc/src/mat/impls/aij/seq/seqcusparse/
>> aijcusparse.cu
>>
>


Re: [petsc-dev] cusparse error

2020-12-09 Thread Mark Adams
And this is a Cuda 11 complex build:
https://gitlab.com/petsc/petsc/-/jobs/901108135

On Wed, Dec 9, 2020 at 8:11 PM Mark Adams  wrote:

> My MR is generating an error. Tee error message says cusparse has not been
> initialized, so I added a cuparse init, but I still get the error
> (appended, *adams/landau-gpu-assembly
> *).  Any
> ideas would be appreciated.
>
> I am trying to reproduce this on Summit and it fails with a timeout limit
> of 60s, but it only runs for a few seconds (see timers). Any ideas?
>
> 19:58 adams/landau-gpu-assembly= ~/petsc$ make -f gmakefile test
> search='ksp_ksp_tutorials-ex71_bddc_cusparse'
> PETSC_ARCH=arch-summit-opt-gnu-cuda
> Using MAKEFLAGS: PETSC_ARCH=arch-summit-opt-gnu-cuda
> search=ksp_ksp_tutorials-ex71_bddc_cusparse
> TEST
> arch-summit-opt-gnu-cuda/tests/counts/ksp_ksp_tutorials-ex71_bddc_cusparse.counts
> not ok ksp_ksp_tutorials-ex71_bddc_cusparse # Exceeded timeout limit of 60
> s
>  ok ksp_ksp_tutorials-ex71_bddc_cusparse # SKIP Command failed so no diff
>
> # -
> #   Summary
> # -
> # FAILED ksp_ksp_tutorials-ex71_bddc_cusparse
> # success 0/1 tests (0.0%)
> # failed 1/1 tests (100.0%)
> # todo 0/1 tests (0.0%)
> # skip 0/1 tests (0.0%)
> #
> # Wall clock time for tests: 3 sec
> # Approximate CPU time (not incl. build time): 3.14 sec
>
>
>
>
>
> not ok ksp_ksp_tutorials-ex71_bddc_cusparse # Error code: 201
> 2391 # [1]PETSC
> ERROR: - Error Message
> --
> 2392 # [1]PETSC
> ERROR: GPU error
> 2393 # [1]PETSC
> ERROR: cuSPARSE error 1 (CUSPARSE_STATUS_NOT_INITIALIZED) : initialization
> error
> 2394 # [1]PETSC
> ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for
> trouble shooting.
> 2395 # [1]PETSC
> ERROR: Petsc Development GIT revision: v3.14.2-85-gd60087d GIT Date:
> 2020-12-09 17:49:59 -0500
> 2396 # [1]PETSC
> ERROR: ../ex71 on a named frog by petsc Wed Dec 9 18:41:10 2020
> 2397 # [1]PETSC
> ERROR: Configure options --package-prefix-hash=/home/petsc/petsc-hash-pkgs
> --with-make-test-np=2 COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g
> -O" --with-scalar-type=complex --with-precision=single
> --with-cuda-dir=/usr/local/cuda-11.0 PETSC_ARCH=arch-ci-linux-cuda11-complex
> 2398 # [1]PETSC
> ERROR: #1 MatConvert_SeqAIJ_SeqAIJCUSPARSE() line 2708 in
> /home/petsc/builds/KFnbdjNX/0/petsc/petsc/src/mat/impls/aij/seq/seqcusparse/
> aijcusparse.cu
> 2399 # [1]PETSC
> ERROR: #2 MatCreate_SeqAIJCUSPARSE() line 2739 in
> /home/petsc/builds/KFnbdjNX/0/petsc/petsc/src/mat/impls/aij/seq/seqcusparse/
> aijcusparse.cu
>


[petsc-dev] cusparse error

2020-12-09 Thread Mark Adams
My MR is generating an error. Tee error message says cusparse has not been
initialized, so I added a cuparse init, but I still get the error
(appended, *adams/landau-gpu-assembly
*).  Any
ideas would be appreciated.

I am trying to reproduce this on Summit and it fails with a timeout limit
of 60s, but it only runs for a few seconds (see timers). Any ideas?

19:58 adams/landau-gpu-assembly= ~/petsc$ make -f gmakefile test
search='ksp_ksp_tutorials-ex71_bddc_cusparse'
PETSC_ARCH=arch-summit-opt-gnu-cuda
Using MAKEFLAGS: PETSC_ARCH=arch-summit-opt-gnu-cuda
search=ksp_ksp_tutorials-ex71_bddc_cusparse
TEST
arch-summit-opt-gnu-cuda/tests/counts/ksp_ksp_tutorials-ex71_bddc_cusparse.counts
not ok ksp_ksp_tutorials-ex71_bddc_cusparse # Exceeded timeout limit of 60 s
 ok ksp_ksp_tutorials-ex71_bddc_cusparse # SKIP Command failed so no diff

# -
#   Summary
# -
# FAILED ksp_ksp_tutorials-ex71_bddc_cusparse
# success 0/1 tests (0.0%)
# failed 1/1 tests (100.0%)
# todo 0/1 tests (0.0%)
# skip 0/1 tests (0.0%)
#
# Wall clock time for tests: 3 sec
# Approximate CPU time (not incl. build time): 3.14 sec





not ok ksp_ksp_tutorials-ex71_bddc_cusparse # Error code: 201
2391 # [1]PETSC
ERROR: - Error Message
--
2392 # [1]PETSC
ERROR: GPU error
2393 # [1]PETSC
ERROR: cuSPARSE error 1 (CUSPARSE_STATUS_NOT_INITIALIZED) : initialization
error
2394 # [1]PETSC
ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
shooting.
2395 # [1]PETSC
ERROR: Petsc Development GIT revision: v3.14.2-85-gd60087d GIT Date:
2020-12-09 17:49:59 -0500
2396 # [1]PETSC
ERROR: ../ex71 on a named frog by petsc Wed Dec 9 18:41:10 2020
2397 # [1]PETSC
ERROR: Configure options --package-prefix-hash=/home/petsc/petsc-hash-pkgs
--with-make-test-np=2 COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g
-O" --with-scalar-type=complex --with-precision=single
--with-cuda-dir=/usr/local/cuda-11.0 PETSC_ARCH=arch-ci-linux-cuda11-complex
2398 # [1]PETSC
ERROR: #1 MatConvert_SeqAIJ_SeqAIJCUSPARSE() line 2708 in
/home/petsc/builds/KFnbdjNX/0/petsc/petsc/src/mat/impls/aij/seq/seqcusparse/
aijcusparse.cu
2399 # [1]PETSC
ERROR: #2 MatCreate_SeqAIJCUSPARSE() line 2739 in
/home/petsc/builds/KFnbdjNX/0/petsc/petsc/src/mat/impls/aij/seq/seqcusparse/
aijcusparse.cu