Dear Pascal,

No, you do not need to try the other solution. I'm glad I could help.
(This asserts the approach that we need to be careful with the vector
pool between different calls.)

Best,
Martin


On 16.03.2017 15:21, Pascal Kraft wrote:
> Hi Martin,
>
>  I have tried a version
> with 
> GrowingVectorMemory<TrilinosWrappers::MPI::BlockVector>::release_unused_memory()
> at the end of each step and removed my change to trilinos_vector.cc
> l.247 (back to the version from dealii source) and it seems to work
> fine. I have not tried the other solution you proposed, should I?
> Would the result help you?
>
> Thank you a lot for your support! This had been driving me crazy :)
>
> Best,
> Pascal
>
> Am Donnerstag, 16. März 2017 08:58:53 UTC+1 schrieb Martin Kronbichler:
>
>     Dear Pascal,
>
>     You are right, in your case one needs to call
>     
> GrowingVectorMemory<TrilinosWrappers::MPI::BlockVector>::release_unused_memory()
>     rather than for the vector. Can you try that as well?
>
>     The problem appears to be that the call to SameAs returns
>     different results for different processors, which it should not
>     be, which is why I suspect that there might be some stale
>     communicator object around. Another indication for that assumption
>     is that you get stuck in the initialization of the temporary
>     vectors of the GMRES solver, which is exactly this kind of situation.
>
>     As to the particular patch I referred to: It does release some
>     memory that might have stale information but it also changes some
>     of the call structures slightly. Could you try to change the
>     following:
>
>     if(vector->Map().SameAs(v.vector->Map()) == false)
>
>     to
>
>     if(v.vector->Map().SameAs(vector->Map()) == false)
>
>     Best, Martin
>
>     On 16.03.2017 01:28, Pascal Kraft wrote:
>>     Hi Martin,
>>     that didn't solve my problem. What I have done in the meantime is
>>     replace the check in line 247 of trilinos_vector.cc with true. I
>>     don't know if this causes memory leaks or anything but my code
>>     seems to be working fine with that change. 
>>     To your suggestion: Would I have also had to call the templated
>>     version for BlockVectors or only for Vectors? I only tried the
>>     latter. Would I have had to also apply some patch to my dealii
>>     library for it to work or is the patch you talked about simply
>>     that you included the functionality of the call
>>     
>> GrowingVectorMemory<TrilinosWrappers::MPI::Vector>::release_unused_memory()
>>     in some places?
>>     I have also wanted to try MPICH instead of OpenMPI because of a
>>     post about an internal error in OpenMPI and one of the functions
>>     appearing in the call stacks sometimes not blocking properly.
>>     Thank you for your time and your fast responses - the whole
>>     library and the people developing it and making it available are
>>     simply awesome ;)
>>     Pascal
>>     Am Mittwoch, 15. März 2017 17:26:23 UTC+1 schrieb Martin Kronbichler:
>>
>>         Dear Pascal,
>>
>>         This problem seems related to a problem we recently worked
>>         around in https://github.com/dealii/dealii/pull/4043
>>         <https://github.com/dealii/dealii/pull/4043>
>>
>>         Can you check what happens if you call
>>         
>> GrowingVectorMemory<TrilinosWrappers::MPI::Vector>::release_unused_memory()
>>
>>         between your optimization steps? If a communicator gets stack
>>         in those places it is likely a stale object somewhere that we
>>         fail to work around for some reason.
>>
>>         Best, Martin
>>
>>         On 15.03.2017 14:10, Pascal Kraft wrote:
>>>         Dear Timo,
>>>         I have done some more digging and found out the following.
>>>         The problems seem to happen in trilinos_vector.cc between
>>>         the lines 240 and 270.
>>>         What I see on the call stacks is, that one process reaches
>>>         line 261 ( ierr = vector->GlobalAssemble (last_action); )
>>>         and then waits inside this call at an MPI_Barrier with the
>>>         following stack:
>>>         20 <symbol is not available> 7fffd4d18f56
>>>         19 opal_progress()  7fffdc56dfca
>>>         18 ompi_request_default_wait_all()  7fffddd54b15
>>>         17 ompi_coll_tuned_barrier_intra_recursivedoubling()
>>>          7fffcf9abb5d
>>>         16 PMPI_Barrier()  7fffddd68a9c
>>>         15 Epetra_MpiDistributor::DoPosts()  7fffe4088b4f
>>>         14 Epetra_MpiDistributor::Do()  7fffe4089773
>>>         13 Epetra_DistObject::DoTransfer()  7fffe400a96a
>>>         12 Epetra_DistObject::Export()  7fffe400b7b7
>>>         11 int Epetra_FEVector::GlobalAssemble<int>()  7fffe4023d7f
>>>         10 Epetra_FEVector::GlobalAssemble()  7fffe40228e3
>>>         The other (in my case three) processes are stuck in the head
>>>         of the if/else-f statement leading up to this point, namely
>>>         in the line 
>>>         if(vector->Map().SameAs(v.vector
>>>         
>>> <https://www.dealii.org/8.4.0/doxygen/deal.II/classTrilinosWrappers_1_1VectorBase.html#afa80df228813b5bd94a6e780a4f5e6ae>->Map())
>>>         == false)
>>>         inside the call to SameAs(...) with stacks like
>>>         15 opal_progress() 7fffdc56dfbc 14
>>>         ompi_request_default_wait_all() 7fffddd54b15 13
>>>         ompi_coll_tuned_allreduce_intra_recursivedoubling()
>>>         7fffcf9a4913 12 PMPI_Allreduce() 7fffddd6587f 11
>>>         Epetra_MpiComm::MinAll() 7fffe408739e 10
>>>         Epetra_BlockMap::SameAs() 7fffe3fb9d74
>>>         Maybe this helps. Producing a smaller example will likely
>>>         not be possible in the coming two weeks but if there are no
>>>         solutions until then I can try.
>>>         Greetings,
>>>         Pascal
>>>         -- The deal.II project is located at http://www.dealii.org/
>>>         For mailing list/forum options, see
>>>         https://groups.google.com/d/forum/dealii?hl=en
>>>         <https://groups.google.com/d/forum/dealii?hl=en> --- You
>>>         received this message because you are subscribed to the
>>>         Google Groups "deal.II User Group" group. To unsubscribe
>>>         from this group and stop receiving emails from it, send an
>>>         email to dealii+un...@googlegroups.com. For more options,
>>>         visit https://groups.google.com/d/optout
>>>         <https://groups.google.com/d/optout>. 
>>
>>     -- The deal.II project is located at http://www.dealii.org/ For
>>     mailing list/forum options, see
>>     https://groups.google.com/d/forum/dealii?hl=en
>>     <https://groups.google.com/d/forum/dealii?hl=en> --- You received
>>     this message because you are subscribed to the Google Groups
>>     "deal.II User Group" group. To unsubscribe from this group and
>>     stop receiving emails from it, send an email to
>>     dealii+un...@googlegroups.com <javascript:>. For more options,
>>     visit https://groups.google.com/d/optout
>>     <https://groups.google.com/d/optout>. 
>
> -- The deal.II project is located at http://www.dealii.org/ For
> mailing list/forum options, see
> https://groups.google.com/d/forum/dealii?hl=en --- You received this
> message because you are subscribed to the Google Groups "deal.II User
> Group" group. To unsubscribe from this group and stop receiving emails
> from it, send an email to dealii+unsubscr...@googlegroups.com
> <mailto:dealii+unsubscr...@googlegroups.com>. For more options, visit
> https://groups.google.com/d/optout. 

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to