Dear Timo,

Thanks for you advice. I am running the program in three different
computers -- my notebook, my research group's server and the local cluster.
In all of them I have this (small) change to suddenly find the nan.

According to MUST, there is clearly a problem inside deal.II, MUMPS or
PETSc, as can be noticed on the file I sent on the previous messages.
Namely,

Invalid MPI_Op, error stack:
>
> MPI_Op_free(111): MPI_Op_free(op=0x7ffcf6298dac) failed
>
> MPI_Op_free(75).: Null Op pointer
>
>
If this error would lead the the issues I am having, is up to discussion.

I tried using PETSc's SolverPreOnly

        SolverControl solver_control;
>
>         PETScWrappers::SolverPreOnly solver(solver_control,
>> mpi_communicator);
>
>         PETScWrappers::PreconditionLU preconditioner(system_matrix);
>
>         solver.solve(system_matrix, distributed_dU, system_rhs,
>> preconditioner);
>
>
It shows the same issues, as expected. I will follow your advice and try to
use a different solver.

Still, would it be possible for you to comment a bit more on those
MPI_Op_free errors?

Cheers,
Lucas

On 17 November 2017 at 15:14, Timo Heister <heis...@clemson.edu> wrote:

> Lucas,
>
> those kind of bugs are hard to find. Honestly, the bug could still be
> in your code, inside deal.II, inside PETSc, inside MUMPS, or related
> to the software/hardware you are running on.
>
> I know this won't be of much help, but I would suggest you try a
> different solver to see if MUMPS is the problematic part here. Maybe
> they are doing some invalid operations (maybe one processor has no
> DoFs?). Try to simplify your test problem as much as possible. If the
> problem is small enough, test on a different machine
> (workstation/laptop), run using valgrind, etc..
>
> Best,
> Timo
>
>
> On Fri, Nov 17, 2017 at 8:13 AM, Lucas Campos <rmk...@gmail.com> wrote:
> > Sorry, I forgot to include the link to MUST. Here it is:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__doc.
> itc.rwth-2Daachen.de_display_CCP_Project-2BMUST&d=DwIBaQ&c=Ngd-
> ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=4k7iKXbjGC8LfYxVJJXiaYVu6FRWmE
> jX38S7JmlS9Vw&m=31KxKzbiEhsoTggloCeUx5owuWvvNSO3OBJDaSNDJks&s=
> MoO0ofNJV7qBLDqBoKHK0fpxT7ResQnpHz4l1KXw3q0&e=
> >
> >
> > On Friday, 17 November 2017 14:12:18 UTC+1, Lucas Campos wrote:
> >>
> >> Dear all,
> >>
> >> First of all, a bit of context:
> >> I am trying to debug an error in my application where randomly I start
> >> seeing nan's. The probability of this increases with the number of MPI
> >> processors I use, so it looks like it is a data race of some sort. Any
> >> advice on the best way to find the error?
> >>
> >> My current approach is to use project MUST[1] to help me find the
> issues.
> >> When I ran MUST with the debug version of my code on the local cluster,
> it
> >> returned a errors related to the MPI internalities of
> dealii/petsc(/MUMPS?).
> >> An exemplary output can be seen on errors.txt. The output stopping in
> >> "Solving... " suggested that the error was in between the following
> lines of
> >> my code:
> >>
> >>>> PetscPrintf(mpi_communicator, "Solving... \n");
> >>>>
> >>>> computing_timer.enter_section("solve");
> >>>>
> >>>>
> >>>> SolverControl cn;
> >>>>
> >>>> PETScWrappers::SparseDirectMUMPS solver(cn, mpi_communicator);
> >>>>
> >>>> solver.set_symmetric_mode(false);
> >>>>
> >>>> solver.solve(system_matrix, distributed_dU, system_rhs);
> >>>>
> >>>>
> >>>> computing_timer.exit_section("solve");
> >>>>
> >>>> PetscPrintf(mpi_communicator, "Solved! \n");
> >>>
> >>>
> >>
> >>  Indeed, when I comment out the "solver.solve(system_matrix,
> >> distributed_dU, system_rhs); " line, it runs with no errors at all.
> >>
> >> Could this be the source of my issues? Also, how can I solve this
> specific
> >> issue?
> >
> > --
> > The deal.II project is located at https://urldefense.proofpoint.
> com/v2/url?u=http-3A__www.dealii.org_&d=DwIBaQ&c=Ngd-
> ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=4k7iKXbjGC8LfYxVJJXiaYVu6FRWmE
> jX38S7JmlS9Vw&m=31KxKzbiEhsoTggloCeUx5owuWvvNS
> O3OBJDaSNDJks&s=FlH-oR80VOUqL_lvynTH4ECrvEahwkkYt5AFNP2aunA&e=
> > For mailing list/forum options, see
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.
> google.com_d_forum_dealii-3Fhl-3Den&d=DwIBaQ&c=Ngd-
> ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=4k7iKXbjGC8LfYxVJJXiaYVu6FRWmE
> jX38S7JmlS9Vw&m=31KxKzbiEhsoTggloCeUx5owuWvvNSO3OBJDaSNDJks&s=
> rbFKu8LqzeLfmSoFFB0CzZIBK77ZoK0iyZM9kWMCRAY&e=
> > ---
> > You received this message because you are subscribed to the Google Groups
> > "deal.II User Group" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to dealii+unsubscr...@googlegroups.com.
> > For more options, visit https://urldefense.proofpoint.
> com/v2/url?u=https-3A__groups.google.com_d_optout&d=DwIBaQ&c=Ngd-
> ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=4k7iKXbjGC8LfYxVJJXiaYVu6FRWmE
> jX38S7JmlS9Vw&m=31KxKzbiEhsoTggloCeUx5owuWvvNSO3OBJDaSNDJks&s=
> sdbgJtXhDYlYqBoAKOgCjPO9NN_zS6ZZKgEb-orkspo&e= .
>
>
>
> --
> Timo Heister
> http://www.math.clemson.edu/~heister/
>
> --
> The deal.II project is located at http://www.dealii.org/
> For mailing list/forum options, see https://groups.google.com/d/
> forum/dealii?hl=en
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "deal.II User Group" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/dealii/nKdtA03jfB0/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> dealii+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to