[petsc-dev] Configure hangs on Summit

2019-09-20 Thread Zhang, Junchao via petsc-dev
My configure hangs on Summit at
  TESTING: configureMPIEXEC from 
config.packages.MPI(config/BuildSystem/config/packages/MPI.py:170)

On the machine one has to use script to submit jobs. So why do we need 
configureMPIEXEC? Do I need to use --with-batch? I remember we removed that.

--Junchao Zhang


Re: [petsc-dev] Configure hangs on Summit

2019-09-20 Thread Zhang, Junchao via petsc-dev
Satish's trick --with-mpiexec=/bin/true solved the problem.  Thanks.
--Junchao Zhang


On Fri, Sep 20, 2019 at 3:50 PM Junchao Zhang 
mailto:jczh...@mcs.anl.gov>> wrote:
My configure hangs on Summit at
  TESTING: configureMPIEXEC from 
config.packages.MPI(config/BuildSystem/config/packages/MPI.py:170)

On the machine one has to use script to submit jobs. So why do we need 
configureMPIEXEC? Do I need to use --with-batch? I remember we removed that.

--Junchao Zhang


Re: [petsc-dev] Configure hangs on Summit

2019-09-20 Thread Mills, Richard Tran via petsc-dev
Hi Junchao,

Glad you've found a workaround, but I don't know why you are hitting this 
problem. The last time I built PETSc on Summit (just a couple days ago), I 
didn't have this problem. I'm working from the example template that's in the 
PETSc repo at config/examples/arch-olcf-summit-opt.py.

Can you point me to your configure script on Summit so I can try to reproduce 
your problem?

--Richard

On 9/20/19 4:25 PM, Zhang, Junchao via petsc-dev wrote:
Satish's trick --with-mpiexec=/bin/true solved the problem.  Thanks.
--Junchao Zhang


On Fri, Sep 20, 2019 at 3:50 PM Junchao Zhang 
mailto:jczh...@mcs.anl.gov>> wrote:
My configure hangs on Summit at
  TESTING: configureMPIEXEC from 
config.packages.MPI(config/BuildSystem/config/packages/MPI.py:170)

On the machine one has to use script to submit jobs. So why do we need 
configureMPIEXEC? Do I need to use --with-batch? I remember we removed that.

--Junchao Zhang



Re: [petsc-dev] Configure hangs on Summit

2019-09-20 Thread Smith, Barry F. via petsc-dev


--with-batch is still there and should be used in such circumstances. The 
difference is that --with-branch does not generate a program that you need to 
submit to the batch system before continuing the configure. Instead 
--with-batch guesses at and skips some of the tests (with clear warnings on how 
you can adjust the guesses).

 Regarding the hanging. This happens because the thread monitoring of 
configure started executables was removed years ago since it was slow and 
occasionally buggy (the default wait was an absurd 10 minutes too). Thus when 
configure tried to test an mpiexec that hung the test would hang.   There is 
code in one of my branches I've been struggling to get into master for a long 
time that puts back the thread monitoring for this one call with a small 
timeout so you should never see this hang again.

  Barry

   We could be a little clever and have configure detect it is on a Cray or 
other batch system and automatically add the batch option. That would be a nice 
little feature for someone to add. Probably just a few lines of code. 
   

> On Sep 20, 2019, at 8:59 PM, Mills, Richard Tran via petsc-dev 
>  wrote:
> 
> Hi Junchao,
> 
> Glad you've found a workaround, but I don't know why you are hitting this 
> problem. The last time I built PETSc on Summit (just a couple days ago), I 
> didn't have this problem. I'm working from the example template that's in the 
> PETSc repo at config/examples/arch-olcf-summit-opt.py.
> 
> Can you point me to your configure script on Summit so I can try to reproduce 
> your problem?
> 
> --Richard
> 
> On 9/20/19 4:25 PM, Zhang, Junchao via petsc-dev wrote:
>> Satish's trick --with-mpiexec=/bin/true solved the problem.  Thanks.
>> --Junchao Zhang
>> 
>> 
>> On Fri, Sep 20, 2019 at 3:50 PM Junchao Zhang  wrote:
>> My configure hangs on Summit at
>>   TESTING: configureMPIEXEC from 
>> config.packages.MPI(config/BuildSystem/config/packages/MPI.py:170)
>> 
>> On the machine one has to use script to submit jobs. So why do we need 
>> configureMPIEXEC? Do I need to use --with-batch? I remember we removed that.
>> 
>> --Junchao Zhang
> 



Re: [petsc-dev] Configure hangs on Summit

2019-09-20 Thread Zhang, Junchao via petsc-dev
Richard,
  I almost copied arch-olcf-summit-opt.py. The hanging is random. I met it few 
weeks ago. I retried and it passed. It happened today when I did a fresh 
configure.
  On Summit login nodes, mpiexec is actually in everyone's PATH. I did "ps ux" 
and found the script was executing "mpiexec ... "
--Junchao Zhang


On Fri, Sep 20, 2019 at 8:59 PM Mills, Richard Tran via petsc-dev 
mailto:petsc-dev@mcs.anl.gov>> wrote:
Hi Junchao,

Glad you've found a workaround, but I don't know why you are hitting this 
problem. The last time I built PETSc on Summit (just a couple days ago), I 
didn't have this problem. I'm working from the example template that's in the 
PETSc repo at config/examples/arch-olcf-summit-opt.py.

Can you point me to your configure script on Summit so I can try to reproduce 
your problem?

--Richard

On 9/20/19 4:25 PM, Zhang, Junchao via petsc-dev wrote:
Satish's trick --with-mpiexec=/bin/true solved the problem.  Thanks.
--Junchao Zhang


On Fri, Sep 20, 2019 at 3:50 PM Junchao Zhang 
mailto:jczh...@mcs.anl.gov>> wrote:
My configure hangs on Summit at
  TESTING: configureMPIEXEC from 
config.packages.MPI(config/BuildSystem/config/packages/MPI.py:170)

On the machine one has to use script to submit jobs. So why do we need 
configureMPIEXEC? Do I need to use --with-batch? I remember we removed that.

--Junchao Zhang



Re: [petsc-dev] Configure hangs on Summit

2019-09-20 Thread Mills, Richard Tran via petsc-dev
Everything that Barry says about '--with-batch' is valid, but let me point out 
one thing about Summit: You don't need "--with-batch" at all, because the 
Summit login/compile nodes run the same hardware (minus the GPUs) and software 
stack as the back-end compute nodes. This makes configuring and building 
software far, far easier than we are used to on the big LCF machines. I was 
actually shocked when I found this out -- I'd gotten so used to struggling with 
cross-compilers, etc.

--Richard

On 9/20/19 9:28 PM, Smith, Barry F. wrote:


--with-batch is still there and should be used in such circumstances. The 
difference is that --with-branch does not generate a program that you need to 
submit to the batch system before continuing the configure. Instead 
--with-batch guesses at and skips some of the tests (with clear warnings on how 
you can adjust the guesses).

 Regarding the hanging. This happens because the thread monitoring of 
configure started executables was removed years ago since it was slow and 
occasionally buggy (the default wait was an absurd 10 minutes too). Thus when 
configure tried to test an mpiexec that hung the test would hang.   There is 
code in one of my branches I've been struggling to get into master for a long 
time that puts back the thread monitoring for this one call with a small 
timeout so you should never see this hang again.

  Barry

   We could be a little clever and have configure detect it is on a Cray or 
other batch system and automatically add the batch option. That would be a nice 
little feature for someone to add. Probably just a few lines of code.




On Sep 20, 2019, at 8:59 PM, Mills, Richard Tran via petsc-dev 
 wrote:

Hi Junchao,

Glad you've found a workaround, but I don't know why you are hitting this 
problem. The last time I built PETSc on Summit (just a couple days ago), I 
didn't have this problem. I'm working from the example template that's in the 
PETSc repo at config/examples/arch-olcf-summit-opt.py.

Can you point me to your configure script on Summit so I can try to reproduce 
your problem?

--Richard

On 9/20/19 4:25 PM, Zhang, Junchao via petsc-dev wrote:


Satish's trick --with-mpiexec=/bin/true solved the problem.  Thanks.
--Junchao Zhang


On Fri, Sep 20, 2019 at 3:50 PM Junchao Zhang 
 wrote:
My configure hangs on Summit at
  TESTING: configureMPIEXEC from 
config.packages.MPI(config/BuildSystem/config/packages/MPI.py:170)

On the machine one has to use script to submit jobs. So why do we need 
configureMPIEXEC? Do I need to use --with-batch? I remember we removed that.

--Junchao Zhang










Re: [petsc-dev] Configure hangs on Summit

2019-09-20 Thread Smith, Barry F. via petsc-dev


  Then the hang is curious. 

> On Sep 20, 2019, at 11:28 PM, Mills, Richard Tran  wrote:
> 
> Everything that Barry says about '--with-batch' is valid, but let me point 
> out one thing about Summit: You don't need "--with-batch" at all, because the 
> Summit login/compile nodes run the same hardware (minus the GPUs) and 
> software stack as the back-end compute nodes. This makes configuring and 
> building software far, far easier than we are used to on the big LCF 
> machines. I was actually shocked when I found this out -- I'd gotten so used 
> to struggling with cross-compilers, etc.
> 
> --Richard
> 
> On 9/20/19 9:28 PM, Smith, Barry F. wrote:
>> --with-batch is still there and should be used in such circumstances. 
>> The difference is that --with-branch does not generate a program that you 
>> need to submit to the batch system before continuing the configure. Instead 
>> --with-batch guesses at and skips some of the tests (with clear warnings on 
>> how you can adjust the guesses).
>> 
>>  Regarding the hanging. This happens because the thread monitoring of 
>> configure started executables was removed years ago since it was slow and 
>> occasionally buggy (the default wait was an absurd 10 minutes too). Thus 
>> when configure tried to test an mpiexec that hung the test would hang.   
>> There is code in one of my branches I've been struggling to get into master 
>> for a long time that puts back the thread monitoring for this one call with 
>> a small timeout so you should never see this hang again.
>> 
>>   Barry
>> 
>>We could be a little clever and have configure detect it is on a Cray or 
>> other batch system and automatically add the batch option. That would be a 
>> nice little feature for someone to add. Probably just a few lines of code. 
>>
>> 
>> 
>>> On Sep 20, 2019, at 8:59 PM, Mills, Richard Tran via petsc-dev 
>>> 
>>>  wrote:
>>> 
>>> Hi Junchao,
>>> 
>>> Glad you've found a workaround, but I don't know why you are hitting this 
>>> problem. The last time I built PETSc on Summit (just a couple days ago), I 
>>> didn't have this problem. I'm working from the example template that's in 
>>> the PETSc repo at config/examples/arch-olcf-summit-opt.py.
>>> 
>>> Can you point me to your configure script on Summit so I can try to 
>>> reproduce your problem?
>>> 
>>> --Richard
>>> 
>>> On 9/20/19 4:25 PM, Zhang, Junchao via petsc-dev wrote:
>>> 
 Satish's trick --with-mpiexec=/bin/true solved the problem.  Thanks.
 --Junchao Zhang
 
 
 On Fri, Sep 20, 2019 at 3:50 PM Junchao Zhang 
 
  wrote:
 My configure hangs on Summit at
   TESTING: configureMPIEXEC from 
 config.packages.MPI(config/BuildSystem/config/packages/MPI.py:170)
 
 On the machine one has to use script to submit jobs. So why do we need 
 configureMPIEXEC? Do I need to use --with-batch? I remember we removed 
 that.
 
 --Junchao Zhang
 
>