Re: [Paraview] 3.98 MPI_Finalize out of order in pvbatch

2012-12-18 Thread Burlen Loring

yeah it's a strange one

The clues we have at this point are the 5 ctests that have been failing 
on the nautilus dashboard. 
http://open.cdash.org/viewTest.php?onlyfailed&buildid=2719388, and 
3.14.1 doesn't have the issue (it's still running OK).


I'll see if I can narrow down when the issue started, and if the mpi  
binaries have debugging symbols I will see if I can walk through.


On 12/18/2012 11:42 AM, Utkarsh Ayachit wrote:

That's really odd. Looking at the call stacks, it looks like the the
code on both processes is at the right location: both are calling
MPI_Finalize(). I verified that MPI_Finalize() does indeed gets called
once (by adding a break point on MPI_Finalize in pvbatch). Burlen, can
you peek into the files (finalize.c, adi.c etc.) to see if we can spot
why the two processes diverge?

Utkarsh

On Fri, Dec 7, 2012 at 3:13 PM, Burlen Loring  wrote:

#5  0x2b073a2e3c04 in PMPI_Finalize () at finalize.c:27


___
Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the ParaView Wiki at: 
http://paraview.org/Wiki/ParaView

Follow this link to subscribe/unsubscribe:
http://www.paraview.org/mailman/listinfo/paraview


Re: [Paraview] 3.98 MPI_Finalize out of order in pvbatch

2012-12-18 Thread Utkarsh Ayachit
That's really odd. Looking at the call stacks, it looks like the the
code on both processes is at the right location: both are calling
MPI_Finalize(). I verified that MPI_Finalize() does indeed gets called
once (by adding a break point on MPI_Finalize in pvbatch). Burlen, can
you peek into the files (finalize.c, adi.c etc.) to see if we can spot
why the two processes diverge?

Utkarsh

On Fri, Dec 7, 2012 at 3:13 PM, Burlen Loring  wrote:
> #5  0x2b073a2e3c04 in PMPI_Finalize () at finalize.c:27
___
Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the ParaView Wiki at: 
http://paraview.org/Wiki/ParaView

Follow this link to subscribe/unsubscribe:
http://www.paraview.org/mailman/listinfo/paraview


Re: [Paraview] 3.98 MPI_Finalize out of order in pvbatch

2012-12-10 Thread Kyle Lutz
On Fri, Dec 7, 2012 at 12:13 PM, Burlen Loring  wrote:
> Hi Kyle et al.
>
> below are stack traces where PV is hung. I'm stumped by this, and can get no
> foothold. I still have one chance if we can get valgrind to run with MPI on
> nautilus. But it's a long shot, valgrinding pvbatch on my local system
> throws many hundreds of errors. I'm not sure which of these are valid
> reports.
>
> PV 3.14.1 doesn't hang in pvbatch, so I wondering if anyone knows of a
> change in 3.98 that may account for the new hang?
>
> Burlen
>
> rank 0
> #0  0x2b0762b3f590 in gru_get_next_message () from
> /usr/lib64/libgru.so.0
> #1  0x2b073a2f4bd2 in MPI_SGI_grudev_progress () at grudev.c:1780
> #2  0x2b073a31cc25 in MPI_SGI_progress_devices () at progress.c:93
> #3  MPI_SGI_progress () at progress.c:207
> #4  0x2b073a3244eb in MPI_SGI_request_finalize () at req.c:1548
> #5  0x2b073a2b8bee in MPI_SGI_finalize () at adi.c:667
> #6  0x2b073a2e3c04 in PMPI_Finalize () at finalize.c:27
> #7  0x2b073969d96f in vtkProcessModule::Finalize () at
> /sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/ParaViewCore/ClientServerCore/Core/vtkProcessModule.cxx:229
> #8  0x2b0737bb0f9e in vtkInitializationHelper::Finalize () at
> /sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/ParaViewCore/ServerManager/SMApplication/vtkInitializationHelper.cxx:145
> #9  0x00403c50 in ParaViewPython::Run (processType=4, argc=2,
> argv=0x7fff06195c88) at
> /sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/CommandLineExecutables/pvpython.h:124
> #10 0x00403cd5 in main (argc=2, argv=0x7fff06195c88) at
> /sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/CommandLineExecutables/pvbatch.cxx:21
>
> rank 1
> #0  0x2b07391bde70 in __nanosleep_nocancel () from
> /lib64/libpthread.so.0
> #1  0x2b073a32c898 in MPI_SGI_millisleep (milliseconds= out>) at sleep.c:34
> #2  0x2b073a326365 in MPI_SGI_slow_request_wait (request=0x7fff061959f8,
> status=0x7fff061959d0, set=0x7fff061959f4, gen_rc=0x7fff061959f0) at
> req.c:1460
> #3  0x2b073a2c6ef3 in MPI_SGI_slow_barrier (comm=1) at barrier.c:275
> #4  0x2b073a2b8bf8 in MPI_SGI_finalize () at adi.c:671
> #5  0x2b073a2e3c04 in PMPI_Finalize () at finalize.c:27
> #6  0x2b073969d96f in vtkProcessModule::Finalize () at
> /sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/ParaViewCore/ClientServerCore/Core/vtkProcessModule.cxx:229
> #7  0x2b0737bb0f9e in vtkInitializationHelper::Finalize () at
> /sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/ParaViewCore/ServerManager/SMApplication/vtkInitializationHelper.cxx:145
> #8  0x00403c50 in ParaViewPython::Run (processType=4, argc=2,
> argv=0x7fff06195c88) at
> /sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/CommandLineExecutables/pvpython.h:124
> #9  0x00403cd5 in main (argc=2, argv=0x7fff06195c88) at
> /sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/CommandLineExecutables/pvbatch.cxx:21

Hi Burlen,

Thanks for getting these. I'll take a closer look today and see what I can find.

-kyle

>
>
>
> On 12/04/2012 05:15 PM, Burlen Loring wrote:
>>
>> Hi Kyle,
>>
>> I was wrong about MPI_Finalize being invoked twice, I had miss read the
>> code. I'm not sure why pvbatch is hanging in MPI_Finalize on Nautilus. I
>> haven't been able to find anything in the debugger. This is new for 3.98.
>>
>> Burlen
>>
>> On 12/03/2012 07:36 AM, Kyle Lutz wrote:
>>>
>>> Hi Burlen,
>>>
>>> On Thu, Nov 29, 2012 at 1:27 PM, Burlen Loring  wrote:

 it looks like pvserver is also impacted, hanging after the gui
 disconnects.


 On 11/28/2012 12:53 PM, Burlen Loring wrote:
>
> Hi All,
>
> some parallel tests have been failing for some time on Nautilus.
> http://open.cdash.org/viewTest.php?onlyfailed&buildid=2684614
>
> There are MPI calls made after finalize which cause deadlock issues on
> SGI
> MPT. It affects pvbatch for sure. The following snip-it shows the bug,
> and
> bug report here: http://paraview.org/Bug/view.php?id=13690
>
>
>
> //
> bool vtkProcessModule::Finalize()
> {
>
>...
>
>vtkProcessModule::GlobalController->Finalize(1);<---mpi_finalize
> called here
>>>
>>> This shouldn't be calling MPI_Finalize() as the finalizedExternally
>>> argument is 1 and in vtkMPIController::Finalize():
>>>
>>>  if (finalizedExternally == 0)
>>>{
>>>MPI_Finalize();
>>>}
>>>
>>> So my guess is that it's being invoked elsewhere.
>>>
>...
>
> #ifdef PARAVIEW_USE_MPI
>if (vtkProcessModule::FinalizeMPI)
>  {
>  MPI_Barrier(MPI_COMM_WORLD);<-barrier
> after
> mpi_finalize
>  MPI_Finalize();<--second
> mpi_finalize
>

Re: [Paraview] 3.98 MPI_Finalize out of order in pvbatch

2012-12-07 Thread Burlen Loring

Hi Kyle et al.

below are stack traces where PV is hung. I'm stumped by this, and can 
get no foothold. I still have one chance if we can get valgrind to run 
with MPI on nautilus. But it's a long shot, valgrinding pvbatch on my 
local system throws many hundreds of errors. I'm not sure which of these 
are valid reports.


PV 3.14.1 doesn't hang in pvbatch, so I wondering if anyone knows of a 
change in 3.98 that may account for the new hang?


Burlen

rank 0
#0  0x2b0762b3f590 in gru_get_next_message () from 
/usr/lib64/libgru.so.0

#1  0x2b073a2f4bd2 in MPI_SGI_grudev_progress () at grudev.c:1780
#2  0x2b073a31cc25 in MPI_SGI_progress_devices () at progress.c:93
#3  MPI_SGI_progress () at progress.c:207
#4  0x2b073a3244eb in MPI_SGI_request_finalize () at req.c:1548
#5  0x2b073a2b8bee in MPI_SGI_finalize () at adi.c:667
#6  0x2b073a2e3c04 in PMPI_Finalize () at finalize.c:27
#7  0x2b073969d96f in vtkProcessModule::Finalize () at 
/sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/ParaViewCore/ClientServerCore/Core/vtkProcessModule.cxx:229
#8  0x2b0737bb0f9e in vtkInitializationHelper::Finalize () at 
/sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/ParaViewCore/ServerManager/SMApplication/vtkInitializationHelper.cxx:145
#9  0x00403c50 in ParaViewPython::Run (processType=4, argc=2, 
argv=0x7fff06195c88) at 
/sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/CommandLineExecutables/pvpython.h:124
#10 0x00403cd5 in main (argc=2, argv=0x7fff06195c88) at 
/sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/CommandLineExecutables/pvbatch.cxx:21


rank 1
#0  0x2b07391bde70 in __nanosleep_nocancel () from 
/lib64/libpthread.so.0
#1  0x2b073a32c898 in MPI_SGI_millisleep (milliseconds=optimized out>) at sleep.c:34
#2  0x2b073a326365 in MPI_SGI_slow_request_wait 
(request=0x7fff061959f8, status=0x7fff061959d0, set=0x7fff061959f4, 
gen_rc=0x7fff061959f0) at req.c:1460

#3  0x2b073a2c6ef3 in MPI_SGI_slow_barrier (comm=1) at barrier.c:275
#4  0x2b073a2b8bf8 in MPI_SGI_finalize () at adi.c:671
#5  0x2b073a2e3c04 in PMPI_Finalize () at finalize.c:27
#6  0x2b073969d96f in vtkProcessModule::Finalize () at 
/sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/ParaViewCore/ClientServerCore/Core/vtkProcessModule.cxx:229
#7  0x2b0737bb0f9e in vtkInitializationHelper::Finalize () at 
/sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/ParaViewCore/ServerManager/SMApplication/vtkInitializationHelper.cxx:145
#8  0x00403c50 in ParaViewPython::Run (processType=4, argc=2, 
argv=0x7fff06195c88) at 
/sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/CommandLineExecutables/pvpython.h:124
#9  0x00403cd5 in main (argc=2, argv=0x7fff06195c88) at 
/sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/CommandLineExecutables/pvbatch.cxx:21



On 12/04/2012 05:15 PM, Burlen Loring wrote:

Hi Kyle,

I was wrong about MPI_Finalize being invoked twice, I had miss read 
the code. I'm not sure why pvbatch is hanging in MPI_Finalize on 
Nautilus. I haven't been able to find anything in the debugger. This 
is new for 3.98.


Burlen

On 12/03/2012 07:36 AM, Kyle Lutz wrote:

Hi Burlen,

On Thu, Nov 29, 2012 at 1:27 PM, Burlen Loring  wrote:
it looks like pvserver is also impacted, hanging after the gui 
disconnects.



On 11/28/2012 12:53 PM, Burlen Loring wrote:

Hi All,

some parallel tests have been failing for some time on Nautilus.
http://open.cdash.org/viewTest.php?onlyfailed&buildid=2684614

There are MPI calls made after finalize which cause deadlock issues 
on SGI
MPT. It affects pvbatch for sure. The following snip-it shows the 
bug, and

bug report here: http://paraview.org/Bug/view.php?id=13690


// 


bool vtkProcessModule::Finalize()
{

   ...

   vtkProcessModule::GlobalController->Finalize(1);<---mpi_finalize
called here

This shouldn't be calling MPI_Finalize() as the finalizedExternally
argument is 1 and in vtkMPIController::Finalize():

 if (finalizedExternally == 0)
   {
   MPI_Finalize();
   }

So my guess is that it's being invoked elsewhere.


   ...

#ifdef PARAVIEW_USE_MPI
   if (vtkProcessModule::FinalizeMPI)
 {
 MPI_Barrier(MPI_COMM_WORLD);<-barrier 
after

mpi_finalize
 MPI_Finalize();<--second
mpi_finalize
 }
#endif

I've made a patch which should prevent this second of code from ever
being called twice by setting the FinalizeMPI flag to false after
calling MPI_Finalize(). Can you take a look here:
http://review.source.kitware.com/#/t/1808/ and let me know if that
helps the issue.

Otherwise, would you be able to set a breakpoint on MPI_Finalize() and
get a backtrace of where it gets invoked for the second time? That
would be very helpful in tracking down the problem.

Thanks,
Kyle




__

Re: [Paraview] 3.98 MPI_Finalize out of order in pvbatch

2012-12-04 Thread Burlen Loring

Hi Kyle,

I was wrong about MPI_Finalize being invoked twice, I had miss read the 
code. I'm not sure why pvbatch is hanging in MPI_Finalize on Nautilus. I 
haven't been able to find anything in the debugger. This is new for 3.98.


Burlen

On 12/03/2012 07:36 AM, Kyle Lutz wrote:

Hi Burlen,

On Thu, Nov 29, 2012 at 1:27 PM, Burlen Loring  wrote:

it looks like pvserver is also impacted, hanging after the gui disconnects.


On 11/28/2012 12:53 PM, Burlen Loring wrote:

Hi All,

some parallel tests have been failing for some time on Nautilus.
http://open.cdash.org/viewTest.php?onlyfailed&buildid=2684614

There are MPI calls made after finalize which cause deadlock issues on SGI
MPT. It affects pvbatch for sure. The following snip-it shows the bug, and
bug report here: http://paraview.org/Bug/view.php?id=13690


//
bool vtkProcessModule::Finalize()
{

   ...

   vtkProcessModule::GlobalController->Finalize(1);<---mpi_finalize
called here

This shouldn't be calling MPI_Finalize() as the finalizedExternally
argument is 1 and in vtkMPIController::Finalize():

 if (finalizedExternally == 0)
   {
   MPI_Finalize();
   }

So my guess is that it's being invoked elsewhere.


   ...

#ifdef PARAVIEW_USE_MPI
   if (vtkProcessModule::FinalizeMPI)
 {
 MPI_Barrier(MPI_COMM_WORLD);<-barrier after
mpi_finalize
 MPI_Finalize();<--second
mpi_finalize
 }
#endif

I've made a patch which should prevent this second of code from ever
being called twice by setting the FinalizeMPI flag to false after
calling MPI_Finalize(). Can you take a look here:
http://review.source.kitware.com/#/t/1808/ and let me know if that
helps the issue.

Otherwise, would you be able to set a breakpoint on MPI_Finalize() and
get a backtrace of where it gets invoked for the second time? That
would be very helpful in tracking down the problem.

Thanks,
Kyle


___
Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the ParaView Wiki at: 
http://paraview.org/Wiki/ParaView

Follow this link to subscribe/unsubscribe:
http://www.paraview.org/mailman/listinfo/paraview


Re: [Paraview] 3.98 MPI_Finalize out of order in pvbatch

2012-12-03 Thread Kyle Lutz
Hi Burlen,

On Thu, Nov 29, 2012 at 1:27 PM, Burlen Loring  wrote:
> it looks like pvserver is also impacted, hanging after the gui disconnects.
>
>
> On 11/28/2012 12:53 PM, Burlen Loring wrote:
>>
>> Hi All,
>>
>> some parallel tests have been failing for some time on Nautilus.
>> http://open.cdash.org/viewTest.php?onlyfailed&buildid=2684614
>>
>> There are MPI calls made after finalize which cause deadlock issues on SGI
>> MPT. It affects pvbatch for sure. The following snip-it shows the bug, and
>> bug report here: http://paraview.org/Bug/view.php?id=13690
>>
>>
>> //
>> bool vtkProcessModule::Finalize()
>> {
>>
>>   ...
>>
>>   vtkProcessModule::GlobalController->Finalize(1); <---mpi_finalize
>> called here

This shouldn't be calling MPI_Finalize() as the finalizedExternally
argument is 1 and in vtkMPIController::Finalize():

if (finalizedExternally == 0)
  {
  MPI_Finalize();
  }

So my guess is that it's being invoked elsewhere.

>>
>>   ...
>>
>> #ifdef PARAVIEW_USE_MPI
>>   if (vtkProcessModule::FinalizeMPI)
>> {
>> MPI_Barrier(MPI_COMM_WORLD); <-barrier after
>> mpi_finalize
>> MPI_Finalize(); <--second
>> mpi_finalize
>> }
>> #endif

I've made a patch which should prevent this second of code from ever
being called twice by setting the FinalizeMPI flag to false after
calling MPI_Finalize(). Can you take a look here:
http://review.source.kitware.com/#/t/1808/ and let me know if that
helps the issue.

Otherwise, would you be able to set a breakpoint on MPI_Finalize() and
get a backtrace of where it gets invoked for the second time? That
would be very helpful in tracking down the problem.

Thanks,
Kyle
___
Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the ParaView Wiki at: 
http://paraview.org/Wiki/ParaView

Follow this link to subscribe/unsubscribe:
http://www.paraview.org/mailman/listinfo/paraview


Re: [Paraview] 3.98 MPI_Finalize out of order in pvbatch

2012-11-29 Thread Burlen Loring

it looks like pvserver is also impacted, hanging after the gui disconnects.

On 11/28/2012 12:53 PM, Burlen Loring wrote:

Hi All,

some parallel tests have been failing for some time on Nautilus.
http://open.cdash.org/viewTest.php?onlyfailed&buildid=2684614

There are MPI calls made after finalize which cause deadlock issues on 
SGI MPT. It affects pvbatch for sure. The following snip-it shows the 
bug, and bug report here: http://paraview.org/Bug/view.php?id=13690


// 


bool vtkProcessModule::Finalize()
{

  ...

  vtkProcessModule::GlobalController->Finalize(1); 
<---mpi_finalize called here


  ...

#ifdef PARAVIEW_USE_MPI
  if (vtkProcessModule::FinalizeMPI)
{
MPI_Barrier(MPI_COMM_WORLD); <-barrier 
after mpi_finalize
MPI_Finalize(); <--second 
mpi_finalize

}
#endif

  ...
}

Burlen




___
Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the ParaView Wiki at: 
http://paraview.org/Wiki/ParaView

Follow this link to subscribe/unsubscribe:
http://www.paraview.org/mailman/listinfo/paraview