On Fri, Dec 7, 2012 at 12:13 PM, Burlen Loring <blor...@lbl.gov> wrote: > Hi Kyle et al. > > below are stack traces where PV is hung. I'm stumped by this, and can get no > foothold. I still have one chance if we can get valgrind to run with MPI on > nautilus. But it's a long shot, valgrinding pvbatch on my local system > throws many hundreds of errors. I'm not sure which of these are valid > reports. > > PV 3.14.1 doesn't hang in pvbatch, so I wondering if anyone knows of a > change in 3.98 that may account for the new hang? > > Burlen > > rank 0 > #0 0x00002b0762b3f590 in gru_get_next_message () from > /usr/lib64/libgru.so.0 > #1 0x00002b073a2f4bd2 in MPI_SGI_grudev_progress () at grudev.c:1780 > #2 0x00002b073a31cc25 in MPI_SGI_progress_devices () at progress.c:93 > #3 MPI_SGI_progress () at progress.c:207 > #4 0x00002b073a3244eb in MPI_SGI_request_finalize () at req.c:1548 > #5 0x00002b073a2b8bee in MPI_SGI_finalize () at adi.c:667 > #6 0x00002b073a2e3c04 in PMPI_Finalize () at finalize.c:27 > #7 0x00002b073969d96f in vtkProcessModule::Finalize () at > /sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/ParaViewCore/ClientServerCore/Core/vtkProcessModule.cxx:229 > #8 0x00002b0737bb0f9e in vtkInitializationHelper::Finalize () at > /sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/ParaViewCore/ServerManager/SMApplication/vtkInitializationHelper.cxx:145 > #9 0x0000000000403c50 in ParaViewPython::Run (processType=4, argc=2, > argv=0x7fff06195c88) at > /sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/CommandLineExecutables/pvpython.h:124 > #10 0x0000000000403cd5 in main (argc=2, argv=0x7fff06195c88) at > /sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/CommandLineExecutables/pvbatch.cxx:21 > > rank 1 > #0 0x00002b07391bde70 in __nanosleep_nocancel () from > /lib64/libpthread.so.0 > #1 0x00002b073a32c898 in MPI_SGI_millisleep (milliseconds=<value optimized > out>) at sleep.c:34 > #2 0x00002b073a326365 in MPI_SGI_slow_request_wait (request=0x7fff061959f8, > status=0x7fff061959d0, set=0x7fff061959f4, gen_rc=0x7fff061959f0) at > req.c:1460 > #3 0x00002b073a2c6ef3 in MPI_SGI_slow_barrier (comm=1) at barrier.c:275 > #4 0x00002b073a2b8bf8 in MPI_SGI_finalize () at adi.c:671 > #5 0x00002b073a2e3c04 in PMPI_Finalize () at finalize.c:27 > #6 0x00002b073969d96f in vtkProcessModule::Finalize () at > /sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/ParaViewCore/ClientServerCore/Core/vtkProcessModule.cxx:229 > #7 0x00002b0737bb0f9e in vtkInitializationHelper::Finalize () at > /sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/ParaViewCore/ServerManager/SMApplication/vtkInitializationHelper.cxx:145 > #8 0x0000000000403c50 in ParaViewPython::Run (processType=4, argc=2, > argv=0x7fff06195c88) at > /sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/CommandLineExecutables/pvpython.h:124 > #9 0x0000000000403cd5 in main (argc=2, argv=0x7fff06195c88) at > /sw/analysis/paraview/3.98/sles11.1_intel11.1.038/ParaView/CommandLineExecutables/pvbatch.cxx:21
Hi Burlen, Thanks for getting these. I'll take a closer look today and see what I can find. -kyle > > > > On 12/04/2012 05:15 PM, Burlen Loring wrote: >> >> Hi Kyle, >> >> I was wrong about MPI_Finalize being invoked twice, I had miss read the >> code. I'm not sure why pvbatch is hanging in MPI_Finalize on Nautilus. I >> haven't been able to find anything in the debugger. This is new for 3.98. >> >> Burlen >> >> On 12/03/2012 07:36 AM, Kyle Lutz wrote: >>> >>> Hi Burlen, >>> >>> On Thu, Nov 29, 2012 at 1:27 PM, Burlen Loring<blor...@lbl.gov> wrote: >>>> >>>> it looks like pvserver is also impacted, hanging after the gui >>>> disconnects. >>>> >>>> >>>> On 11/28/2012 12:53 PM, Burlen Loring wrote: >>>>> >>>>> Hi All, >>>>> >>>>> some parallel tests have been failing for some time on Nautilus. >>>>> http://open.cdash.org/viewTest.php?onlyfailed&buildid=2684614 >>>>> >>>>> There are MPI calls made after finalize which cause deadlock issues on >>>>> SGI >>>>> MPT. It affects pvbatch for sure. The following snip-it shows the bug, >>>>> and >>>>> bug report here: http://paraview.org/Bug/view.php?id=13690 >>>>> >>>>> >>>>> >>>>> //---------------------------------------------------------------------------- >>>>> bool vtkProcessModule::Finalize() >>>>> { >>>>> >>>>> ... >>>>> >>>>> vtkProcessModule::GlobalController->Finalize(1);<-------mpi_finalize >>>>> called here >>> >>> This shouldn't be calling MPI_Finalize() as the finalizedExternally >>> argument is 1 and in vtkMPIController::Finalize(): >>> >>> if (finalizedExternally == 0) >>> { >>> MPI_Finalize(); >>> } >>> >>> So my guess is that it's being invoked elsewhere. >>> >>>>> ... >>>>> >>>>> #ifdef PARAVIEW_USE_MPI >>>>> if (vtkProcessModule::FinalizeMPI) >>>>> { >>>>> MPI_Barrier(MPI_COMM_WORLD);<-------------------------barrier >>>>> after >>>>> mpi_finalize >>>>> MPI_Finalize();<--------------------------------------second >>>>> mpi_finalize >>>>> } >>>>> #endif >>> >>> I've made a patch which should prevent this second of code from ever >>> being called twice by setting the FinalizeMPI flag to false after >>> calling MPI_Finalize(). Can you take a look here: >>> http://review.source.kitware.com/#/t/1808/ and let me know if that >>> helps the issue. >>> >>> Otherwise, would you be able to set a breakpoint on MPI_Finalize() and >>> get a backtrace of where it gets invoked for the second time? That >>> would be very helpful in tracking down the problem. >>> >>> Thanks, >>> Kyle >> >> > _______________________________________________ Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView Follow this link to subscribe/unsubscribe: http://www.paraview.org/mailman/listinfo/paraview