Re: [deal.II] Re: Parallel DoFRenumbering does not work on mac
Attached is the simplest code that shows problem for me on mac osx 10.13.3 + mpichRan in debug modewithout renumbering: runs finewith cuthill_mckee renumbering $ mpirun -np 2 ./step-40 BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES= PID 4712 RUNNING AT Praveens-MacBook-Pro.local= EXIT CODE: 6= CLEANING UP REMAINING PROCESSES= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES===YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Abort trap: 6 (signal 6)This typically refers to a problem with your application.Please see the FAQ page for debugging suggestionsSo the problem seems to occur inside renumbering itself.Bestpraveen -- The deal.II project is located at http://www.dealii.org/ For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en --- You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to dealii+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. step-40.cc Description: Binary data -- The deal.II project is located at http://www.dealii.org/ For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en --- You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to dealii+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [deal.II] Re: Parallel DoFRenumbering does not work on mac
I ran the example code with renumbering on my macbook with mpich + osx 10.13 and I get same type of crash when running in parallel. I recently switched to mpich since I was having some memory leak with openmpi in one of my applications (not deal.II related). I have not extensively tested my deal.II programs with mpich yet. Best praveen > On 28-Jan-2018, at 9:53 AM, Wolfgang Bangerth wrote: > > On 01/27/2018 09:17 PM, Jie Cheng wrote: >> I finally fixed it! If there is anyone who has the same problem as I >> described: replacing MPICH with open-mpi resolves this issue. I do not know >> why, but this method has fixed two macs for me: I have never installed >> open-mpi before so there were no conflicting between MPICH and open-mpi. I >> still think something is wrong among dealii, mpich and mac OS 10.13. > > Jie -- I'm glad to hear you made it work! > > Everyone who is in software for long enough will be prudent to say that it's > likely that these interactions exist. It's possible that dealii, mpich and > mac OS 10.13 don't work well together. The difficulty is finding a testcase > that clearly demonstrates that this is so, and then being able to understand > what exactly is happening. I tried the program you sent earlier this week and > it works just fine for me. I think that without a debugger backtrace and > other information, it's going to be difficult to figure out what concretely > doesn't work. But it's good to know that at least you have found a workaround. > > Best > W. -- The deal.II project is located at http://www.dealii.org/ For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en --- You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to dealii+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [deal.II] Re: Parallel DoFRenumbering does not work on mac
On 01/27/2018 09:17 PM, Jie Cheng wrote: I finally fixed it! If there is anyone who has the same problem as I described: replacing MPICH with open-mpi resolves this issue. I do not know why, but this method has fixed two macs for me: I have never installed open-mpi before so there were no conflicting between MPICH and open-mpi. I still think something is wrong among dealii, mpich and mac OS 10.13. Jie -- I'm glad to hear you made it work! Everyone who is in software for long enough will be prudent to say that it's likely that these interactions exist. It's possible that dealii, mpich and mac OS 10.13 don't work well together. The difficulty is finding a testcase that clearly demonstrates that this is so, and then being able to understand what exactly is happening. I tried the program you sent earlier this week and it works just fine for me. I think that without a debugger backtrace and other information, it's going to be difficult to figure out what concretely doesn't work. But it's good to know that at least you have found a workaround. Best W. -- Wolfgang Bangerth email: bange...@colostate.edu www: http://www.math.colostate.edu/~bangerth/ -- The deal.II project is located at http://www.dealii.org/ For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en --- You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to dealii+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [deal.II] Re: Parallel DoFRenumbering does not work on mac
Hi Bruno and Wolfgang I finally fixed it! If there is anyone who has the same problem as I described: replacing MPICH with open-mpi resolves this issue. I do not know why, but this method has fixed two macs for me: I have never installed open-mpi before so there were no conflicting between MPICH and open-mpi. I still think something is wrong among dealii, mpich and mac OS 10.13. Jie -- The deal.II project is located at http://www.dealii.org/ For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en --- You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to dealii+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [deal.II] Re: Parallel DoFRenumbering does not work on mac
Hi Wolfgang Here is the program, I did not but adding a DoFRenumbering::Cuthill_McKee(dof_handler) call in 251. I tried to debug with lldb but did not gain any useful information. I will check the video again to make sure I was doing it right. Thank you very much! Jie On Tue, Jan 23, 2018 at 11:28 PM Wolfgang Bangerth wrote: > On 01/23/2018 01:16 PM, Jie Cheng wrote: > > These are just warnings -- what happens if you run the executable? > > > > > > If I do not modify step-40.cc, it runs fine both in serial and parallel. > After > > I added DoFRenumbering in it, it crashes in parallel. Does > DoFRenumbering have > > any dependency? > > Not that I know of. Remind which function in that namespace you are using? > > > > As I posted in previous messages, the debugger does not help > > much. The same problem happens on another mac in my lab which has Mac OS > > 10.13.2 installed. > This is certainly strange. I wouldn't know what might cause this, and > without > having a backtrace of where things go wrong, it's just really hard to tell. > > If you send me the program again, I can try to run it on my linux boxes to > see > whether it's something I can reproduce. Otherwise, have you tried the > technique for running a parallel program in a debugger I discuss in one of > my > video lectures? > > Best > W. > > -- > > Wolfgang Bangerth email: bange...@colostate.edu > www: http://www.math.colostate.edu/~bangerth/ > > -- > The deal.II project is located at http://www.dealii.org/ > For mailing list/forum options, see > https://groups.google.com/d/forum/dealii?hl=en > --- > You received this message because you are subscribed to a topic in the > Google Groups "deal.II User Group" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/dealii/qYSv_95ay6I/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > dealii+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- The deal.II project is located at http://www.dealii.org/ For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en --- You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to dealii+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. step-40.cc Description: Binary data
Re: [deal.II] Re: Parallel DoFRenumbering does not work on mac
On 01/23/2018 01:16 PM, Jie Cheng wrote: These are just warnings -- what happens if you run the executable? If I do not modify step-40.cc, it runs fine both in serial and parallel. After I added DoFRenumbering in it, it crashes in parallel. Does DoFRenumbering have any dependency? Not that I know of. Remind which function in that namespace you are using? As I posted in previous messages, the debugger does not help much. The same problem happens on another mac in my lab which has Mac OS 10.13.2 installed. This is certainly strange. I wouldn't know what might cause this, and without having a backtrace of where things go wrong, it's just really hard to tell. If you send me the program again, I can try to run it on my linux boxes to see whether it's something I can reproduce. Otherwise, have you tried the technique for running a parallel program in a debugger I discuss in one of my video lectures? Best W. -- Wolfgang Bangerth email: bange...@colostate.edu www: http://www.math.colostate.edu/~bangerth/ -- The deal.II project is located at http://www.dealii.org/ For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en --- You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to dealii+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [deal.II] Re: Parallel DoFRenumbering does not work on mac
Hi Wolfgang > These are just warnings -- what happens if you run the executable? > If I do not modify step-40.cc, it runs fine both in serial and parallel. After I added DoFRenumbering in it, it crashes in parallel. Does DoFRenumbering have any dependency? As I posted in previous messages, the debugger does not help much. The same problem happens on another mac in my lab which has Mac OS 10.13.2 installed. I wonder if there is anyone can reproduce this problem on Mac OS X 10.13.2? Thanks Jie -- The deal.II project is located at http://www.dealii.org/ For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en --- You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to dealii+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [deal.II] Re: Parallel DoFRenumbering does not work on mac
On 01/22/2018 09:17 PM, Jie Cheng wrote: I've reinstalled MPICH, and did a clean build of p4est, petsc and dealii, this problem still exists. At the linking stage of building dealii, I got warnings: [526/579] Linking CXX shared library lib/libdeal_II.9.0.0-pre.dylib ld: warning: could not create compact unwind for _dgehrd_: stack subq instruction is too different from dwarf stack size These are just warnings -- what happens if you run the executable? The function referenced here is a LAPACK function. It looks like it has been compiled for a different system than the one you're on. But the error message you show doesn't give away how this library got onto your system, so I don't know what to suggest. Best W. -- Wolfgang Bangerth email: bange...@colostate.edu www: http://www.math.colostate.edu/~bangerth/ -- The deal.II project is located at http://www.dealii.org/ For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en --- You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to dealii+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [deal.II] Re: Parallel DoFRenumbering does not work on mac
I've reinstalled MPICH, and did a clean build of p4est, petsc and dealii, this problem still exists. At the linking stage of building dealii, I got warnings: [526/579] Linking CXX shared library lib/libdeal_II.9.0.0-pre.dylib ld: warning: could not create compact unwind for _dgehrd_: stack subq instruction is too different from dwarf stack size ld: warning: could not create compact unwind for _dhseqr_: stack subq instruction is too different from dwarf stack size ld: warning: could not create compact unwind for _sgehrd_: stack subq instruction is too different from dwarf stack size ld: warning: could not create compact unwind for _shseqr_: stack subq instruction is too different from dwarf stack size ld: warning: could not create compact unwind for _dormlq_: stack subq instruction is too different from dwarf stack size ld: warning: could not create compact unwind for _dormql_: stack subq instruction is too different from dwarf stack size ld: warning: could not create compact unwind for _dormqr_: stack subq instruction is too different from dwarf stack size ld: warning: could not create compact unwind for _sormlq_: stack subq instruction is too different from dwarf stack size ld: warning: could not create compact unwind for _sormql_: stack subq instruction is too different from dwarf stack size ld: warning: could not create compact unwind for _sormqr_: stack subq instruction is too different from dwarf stack size I am not sure if this is related. Anyone has a clue? Thanks Jie -- The deal.II project is located at http://www.dealii.org/ For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en --- You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to dealii+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [deal.II] Re: Parallel DoFRenumbering does not work on mac
Bruno, > This says that you have a problem with MPI. Your program died before >> executing any code in deal. Can you run other MPI codes? >> > Yes, I can run step-40 in parallel. This problem happens only after I added a DoFRenumbering call in it (step-40.cc originally does not call DoFRenumbering). What's even more weird is that running on 1 processor without debugger is OK. But if I use lldb, it immediately crashes as shown above. Jie -- The deal.II project is located at http://www.dealii.org/ For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en --- You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to dealii+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [deal.II] Re: Parallel DoFRenumbering does not work on mac
Jie, 2018-01-18 14:31 GMT-05:00 Jie Cheng : > Fatal error in MPI_Init_thread: Other MPI error, error stack: > MPIR_Init_thread(474): > MPID_Init(152)...: channel initialization failed > MPID_Init(426)...: PMI_Get_appnum returned -1 > [cli_0]: write_line error; fd=6 buf=:cmd=abort exitcode=1094159 > : > system msg for write_line failure : Bad file descriptor > Process 73295 exited with status = 15 (0x000f) > This says that you have a problem with MPI. Your program died before executing any code in deal. Can you run other MPI codes? Best, Bruno -- The deal.II project is located at http://www.dealii.org/ For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en --- You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to dealii+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[deal.II] Re: Parallel DoFRenumbering does not work on mac
Hi Bruno > There is not enough information for us to help you, this error message is > not from deal. At this point the code has failed and this is just a > cleaning operation. Have you tried running your code in a debugger? > > When I run `lldb ./step-40` everything runs smoothly. But if I do `mpirun -n 1 lldb ./step-40`, I get: zhangl-macpro12:step-40 jiecheng$ mpirun -n 1 lldb ./step-40 (lldb) target create "./step-40" Current executable set to './step-40' (x86_64). run (lldb) run Process 73295 launched: './step-40' (x86_64) [cli_0]: write_line error; fd=6 buf=:cmd=init pmi_version=1 pmi_subversion=1 : system msg for write_line failure : Bad file descriptor [cli_0]: Unable to write to PMI_fd [cli_0]: write_line error; fd=6 buf=:cmd=get_appnum : system msg for write_line failure : Bad file descriptor Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(474): MPID_Init(152)...: channel initialization failed MPID_Init(426)...: PMI_Get_appnum returned -1 [cli_0]: write_line error; fd=6 buf=:cmd=abort exitcode=1094159 : system msg for write_line failure : Bad file descriptor Process 73295 exited with status = 15 (0x000f) Is this useful information? Jie -- The deal.II project is located at http://www.dealii.org/ For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en --- You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to dealii+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[deal.II] Re: Parallel DoFRenumbering does not work on mac
Jie, On Thursday, January 18, 2018 at 11:17:55 AM UTC-5, Jie Cheng wrote: > > I found that DoFRenumbering with distributed dof_handler does not work on > my mac (Mac OS X 10.13.2). My dealii is built with PETSc and Step-40 runs > fine. But once I added DoFRenumbering::Cuthill_McKee(dof_handler) > after dof_handler.distribute_dofs (fe), it gave me runtime error: > > === > > > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES > = PID 65482 RUNNING AT zhangl-macpro12 > = EXIT CODE: 6 > = CLEANING UP REMAINING PROCESSES > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES > === > > > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Abort trap: 6 (signal > 6) > This typically refers to a problem with your application. > Please see the FAQ page for debugging suggestions > There is not enough information for us to help you, this error message is not from deal. At this point the code has failed and this is just a cleaning operation. Have you tried running your code in a debugger? Best, Bruno -- The deal.II project is located at http://www.dealii.org/ For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en --- You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to dealii+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.