Hi John, Please ignore my last post that speculated the bug was on GetPot. There is an error in my test code.
So back to my original question and your reply in the first round, let me clarify a few things: 1. I simply passed argv to LibMeshInit(). I did no preprocessing of argv. 2. As you pointed out, the bug occurred at libmesh.C, line 356, which is "libmesh_assert(remote_elem);". This line then caused a problem with ostream. Although this line is unlikely to cause a seg fault as you said, the error did occur here. 3. This error occurred only on one cluster. I did not encounter this seg-fault error on another two machines. All three machines were running libmesh 0.9.2 with the RelWithDebInfo mode. Such inconsistent performance across platforms is confusing. Here is the latest error message (passing argvs to LibmeshInit(), without "--keep-cout" ): ==3239== Process terminating with default action of signal 11 (SIGSEGV) ==3239== Access not within mapped region at address 0x0 ==3239== at 0x3542F853F8: std::ostream::sentry::sentry(std::ostream&) (in /usr/lib64/libstdc++.so.6.0.3) ==3239== by 0x3542F8557D: std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) (in /usr/lib64/libstdc++.so.6.0.3) ==3239== by 0x6D18491: libMesh::BasicOStreamProxy<char, std::char_traits<char> >& libMesh::BasicOStreamProxy<char, std::char_traits<char> >::operator<< <char[32]>(char[32] const&) (ostream_proxy.h:124) ==3239== by 0x6D76ECB: _ZN7libMesh11LibMeshInitC9EiPKPKcP19ompi_communicator_t (libmesh.C:356) ==3239== by 0x6D793BD: libMesh::LibMeshInit::LibMeshInit(int, char const* const*, ompi_communicator_t*) (libmesh.C:0) ==3239== by 0x4387B3: main (main_RealHeart.cpp:303) Cheers, Dafang On 05/01/2014 04:27 PM, John Peterson wrote: > > > > On Thu, May 1, 2014 at 2:04 PM, Dafang Wang <[email protected] > <mailto:[email protected]>> wrote: > > Hi John, > > After a second check I think the bug that led to seg fault lies in > the LibmeshInit class, perhaps its constructor. As the updated > error message shows below, line 328 of libmesh.C is > "command_line.reset (new GetPot (argc, argv));", which creates a > GetPot object. > > Also FYI, line 295 of my user code main_RealHeart.cpp is > "LibMeshInit init (argc, argv);". > > ==15661== Invalid read of size 8 > ==15661== at 0x6D675C9: GetPot::parse_command_line(int, char > const* const*, char const*) (getpot.h:558) > ==15661== by 0x6D67312: _ZN6GetPotC9EiPKPKcS1_ (getpot.h:536) > ==15661== by 0x6D6746D: GetPot::GetPot(int, char const* const*, > char const*) (getpot.h:65536) > ==15661== by 0x6D76B97: > _ZN7libMesh11LibMeshInitC9EiPKPKcP19ompi_communicator_t > (libmesh.C:328) > ==15661== by 0x6D793BD: libMesh::LibMeshInit::LibMeshInit(int, > char const* const*, ompi_communicator_t*) (libmesh.C:0) > ==15661== by 0x4387B0: main (main_RealHeart.cpp:295) > ==15661== Address 0x0 is not stack'd, malloc'd or (recently) free'd > ==15661== > ==15661== Process terminating with default action of signal 11 > (SIGSEGV) > ==15661== Access not within mapped region at address 0x0 > ==15661== at 0x6D675C9: GetPot::parse_command_line(int, char > const* const*, char const*) (getpot.h:558) > ==15661== by 0x6D67312: _ZN6GetPotC9EiPKPKcS1_ (getpot.h:536) > ==15661== by 0x6D6746D: GetPot::GetPot(int, char const* const*, > char const*) (getpot.h:65536) > ==15661== by 0x6D76B97: > _ZN7libMesh11LibMeshInitC9EiPKPKcP19ompi_communicator_t > (libmesh.C:328) > ==15661== by 0x6D793BD: libMesh::LibMeshInit::LibMeshInit(int, > char const* const*, ompi_communicator_t*) (libmesh.C:0) > ==15661== by 0x4387B0: main (main_RealHeart.cpp:295) > > > > Do you process argv at all before you pass it to LibMeshInit? What is > your prototype for main()? > > In this stack trace, the segfault seems to be from line 558 of > getpot.h in the parse_command_line() function. > > We have made a couple of small fixes in getpot.h since 0.9.2.2 came > out, you might want to try the most recent version from libmesh master > and see if it works for you... > > -- > John -- Dafang Wang, Ph.D Postdoctoral Fellow Institute of Computational Medicine Department of Biomedical Engineering Johns Hopkins University Hackerman Hall Room 218 Baltimore, MD, 21218 ------------------------------------------------------------------------------ "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available. Simple to use. Nothing to install. Get started now for free." http://p.sf.net/sfu/SauceLabs _______________________________________________ Libmesh-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/libmesh-users
