Hi John,

Please ignore my last post that speculated the bug was on GetPot. There 
is an error in my test code.

So back to my original question and your reply in the first round, let 
me clarify a few things:

1. I simply passed argv to LibMeshInit(). I did no preprocessing of argv.

2. As you pointed out, the bug occurred at libmesh.C, line 356, which is 
"libmesh_assert(remote_elem);". This line then caused a problem with 
ostream.  Although this line is unlikely to cause a seg fault as you 
said, the error did occur here.

3. This error occurred only on one cluster. I did not encounter this 
seg-fault error on another two machines. All three machines were running 
libmesh 0.9.2 with the RelWithDebInfo mode. Such inconsistent 
performance across platforms is confusing.

Here is the latest error message (passing argvs to LibmeshInit(), 
without "--keep-cout" ):
==3239== Process terminating with default action of signal 11 (SIGSEGV)
==3239==  Access not within mapped region at address 0x0
==3239==    at 0x3542F853F8: std::ostream::sentry::sentry(std::ostream&) 
(in /usr/lib64/libstdc++.so.6.0.3)
==3239==    by 0x3542F8557D: std::basic_ostream<char, 
std::char_traits<char> >& std::operator<< <std::char_traits<char> 
 >(std::basic_ostream<char, std::char_traits<char> >&, char const*) (in 
/usr/lib64/libstdc++.so.6.0.3)
==3239==    by 0x6D18491: libMesh::BasicOStreamProxy<char, 
std::char_traits<char> >& libMesh::BasicOStreamProxy<char, 
std::char_traits<char> >::operator<< <char[32]>(char[32] const&) 
(ostream_proxy.h:124)
==3239==    by 0x6D76ECB: 
_ZN7libMesh11LibMeshInitC9EiPKPKcP19ompi_communicator_t (libmesh.C:356)
==3239==    by 0x6D793BD: libMesh::LibMeshInit::LibMeshInit(int, char 
const* const*, ompi_communicator_t*) (libmesh.C:0)
==3239==    by 0x4387B3: main (main_RealHeart.cpp:303)

Cheers,
Dafang

On 05/01/2014 04:27 PM, John Peterson wrote:
>
>
>
> On Thu, May 1, 2014 at 2:04 PM, Dafang Wang <[email protected] 
> <mailto:[email protected]>> wrote:
>
>     Hi John,
>
>     After a second check I think the bug that led to seg fault lies in
>     the LibmeshInit class, perhaps its constructor. As the updated
>     error message shows below, line 328 of libmesh.C is
>     "command_line.reset (new GetPot (argc, argv));", which creates a
>     GetPot object.
>
>     Also FYI, line 295 of my user code main_RealHeart.cpp is
>     "LibMeshInit init (argc, argv);".
>
>     ==15661== Invalid read of size 8
>     ==15661==    at 0x6D675C9: GetPot::parse_command_line(int, char
>     const* const*, char const*) (getpot.h:558)
>     ==15661==    by 0x6D67312: _ZN6GetPotC9EiPKPKcS1_ (getpot.h:536)
>     ==15661==    by 0x6D6746D: GetPot::GetPot(int, char const* const*,
>     char const*) (getpot.h:65536)
>     ==15661==    by 0x6D76B97:
>     _ZN7libMesh11LibMeshInitC9EiPKPKcP19ompi_communicator_t
>     (libmesh.C:328)
>     ==15661==    by 0x6D793BD: libMesh::LibMeshInit::LibMeshInit(int,
>     char const* const*, ompi_communicator_t*) (libmesh.C:0)
>     ==15661==    by 0x4387B0: main (main_RealHeart.cpp:295)
>     ==15661==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
>     ==15661==
>     ==15661== Process terminating with default action of signal 11
>     (SIGSEGV)
>     ==15661==  Access not within mapped region at address 0x0
>     ==15661==    at 0x6D675C9: GetPot::parse_command_line(int, char
>     const* const*, char const*) (getpot.h:558)
>     ==15661==    by 0x6D67312: _ZN6GetPotC9EiPKPKcS1_ (getpot.h:536)
>     ==15661==    by 0x6D6746D: GetPot::GetPot(int, char const* const*,
>     char const*) (getpot.h:65536)
>     ==15661==    by 0x6D76B97:
>     _ZN7libMesh11LibMeshInitC9EiPKPKcP19ompi_communicator_t
>     (libmesh.C:328)
>     ==15661==    by 0x6D793BD: libMesh::LibMeshInit::LibMeshInit(int,
>     char const* const*, ompi_communicator_t*) (libmesh.C:0)
>     ==15661==    by 0x4387B0: main (main_RealHeart.cpp:295)
>
>
>
> Do you process argv at all before you pass it to LibMeshInit?  What is 
> your prototype for main()?
>
> In this stack trace, the segfault seems to be from line 558 of 
> getpot.h in the parse_command_line() function.
>
> We have made a couple of small fixes in getpot.h since 0.9.2.2 came 
> out, you might want to try the most recent version from libmesh master 
> and see if it works for you...
>
> -- 
> John

-- 
Dafang Wang, Ph.D
Postdoctoral Fellow
Institute of Computational Medicine
Department of Biomedical Engineering
Johns Hopkins University
Hackerman Hall Room 218
Baltimore, MD, 21218
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get 
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Libmesh-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libmesh-users

Reply via email to