Re: [deal.II] Re: Internal instability of the GMRES Solver / Trilinos

2017-03-16 Thread Martin Kronbichler
Dear Pascal,

No, you do not need to try the other solution. I'm glad I could help.
(This asserts the approach that we need to be careful with the vector
pool between different calls.)

Best,
Martin


On 16.03.2017 15:21, Pascal Kraft wrote:
> Hi Martin,
>
>  I have tried a version
> with 
> GrowingVectorMemory::release_unused_memory()
> at the end of each step and removed my change to trilinos_vector.cc
> l.247 (back to the version from dealii source) and it seems to work
> fine. I have not tried the other solution you proposed, should I?
> Would the result help you?
>
> Thank you a lot for your support! This had been driving me crazy :)
>
> Best,
> Pascal
>
> Am Donnerstag, 16. März 2017 08:58:53 UTC+1 schrieb Martin Kronbichler:
>
> Dear Pascal,
>
> You are right, in your case one needs to call
> 
> GrowingVectorMemory::release_unused_memory()
> rather than for the vector. Can you try that as well?
>
> The problem appears to be that the call to SameAs returns
> different results for different processors, which it should not
> be, which is why I suspect that there might be some stale
> communicator object around. Another indication for that assumption
> is that you get stuck in the initialization of the temporary
> vectors of the GMRES solver, which is exactly this kind of situation.
>
> As to the particular patch I referred to: It does release some
> memory that might have stale information but it also changes some
> of the call structures slightly. Could you try to change the
> following:
>
> if(vector->Map().SameAs(v.vector->Map()) == false)
>
> to
>
> if(v.vector->Map().SameAs(vector->Map()) == false)
>
> Best, Martin
>
> On 16.03.2017 01:28, Pascal Kraft wrote:
>> Hi Martin,
>> that didn't solve my problem. What I have done in the meantime is
>> replace the check in line 247 of trilinos_vector.cc with true. I
>> don't know if this causes memory leaks or anything but my code
>> seems to be working fine with that change. 
>> To your suggestion: Would I have also had to call the templated
>> version for BlockVectors or only for Vectors? I only tried the
>> latter. Would I have had to also apply some patch to my dealii
>> library for it to work or is the patch you talked about simply
>> that you included the functionality of the call
>> 
>> GrowingVectorMemory::release_unused_memory()
>> in some places?
>> I have also wanted to try MPICH instead of OpenMPI because of a
>> post about an internal error in OpenMPI and one of the functions
>> appearing in the call stacks sometimes not blocking properly.
>> Thank you for your time and your fast responses - the whole
>> library and the people developing it and making it available are
>> simply awesome ;)
>> Pascal
>> Am Mittwoch, 15. März 2017 17:26:23 UTC+1 schrieb Martin Kronbichler:
>>
>> Dear Pascal,
>>
>> This problem seems related to a problem we recently worked
>> around in https://github.com/dealii/dealii/pull/4043
>> 
>>
>> Can you check what happens if you call
>> 
>> GrowingVectorMemory::release_unused_memory()
>>
>> between your optimization steps? If a communicator gets stack
>> in those places it is likely a stale object somewhere that we
>> fail to work around for some reason.
>>
>> Best, Martin
>>
>> On 15.03.2017 14:10, Pascal Kraft wrote:
>>> Dear Timo,
>>> I have done some more digging and found out the following.
>>> The problems seem to happen in trilinos_vector.cc between
>>> the lines 240 and 270.
>>> What I see on the call stacks is, that one process reaches
>>> line 261 ( ierr = vector->GlobalAssemble (last_action); )
>>> and then waits inside this call at an MPI_Barrier with the
>>> following stack:
>>> 20  7fffd4d18f56
>>> 19 opal_progress()  7fffdc56dfca
>>> 18 ompi_request_default_wait_all()  7fffddd54b15
>>> 17 ompi_coll_tuned_barrier_intra_recursivedoubling()
>>>  7fffcf9abb5d
>>> 16 PMPI_Barrier()  7fffddd68a9c
>>> 15 Epetra_MpiDistributor::DoPosts()  7fffe4088b4f
>>> 14 Epetra_MpiDistributor::Do()  7fffe4089773
>>> 13 Epetra_DistObject::DoTransfer()  7fffe400a96a
>>> 12 Epetra_DistObject::Export()  7fffe400b7b7
>>> 11 int Epetra_FEVector::GlobalAssemble()  7fffe4023d7f
>>> 10 Epetra_FEVector::GlobalAssemble()  7fffe40228e3
>>> The other (in my case three) processes are stuck in the head
>>> of the if/else-f statement leading up to this point, namely
>>> in the line 
>>> if(vector->Map().SameAs(v.vector
>>> 
>>> ->Map())
>>>  

Re: [deal.II] Re: Internal instability of the GMRES Solver / Trilinos

2017-03-16 Thread Pascal Kraft
Hi Martin,

 I have tried a version 
with 
GrowingVectorMemory::release_unused_memory()
 
at the end of each step and removed my change to trilinos_vector.cc l.247 
(back to the version from dealii source) and it seems to work fine. I have 
not tried the other solution you proposed, should I? Would the result help 
you?

Thank you a lot for your support! This had been driving me crazy :)

Best,
Pascal

Am Donnerstag, 16. März 2017 08:58:53 UTC+1 schrieb Martin Kronbichler:
>
> Dear Pascal,
>
> You are right, in your case one needs to call 
>
> GrowingVectorMemory::release_unused_memory()
> rather than for the vector. Can you try that as well?
>
> The problem appears to be that the call to SameAs returns different 
> results for different processors, which it should not be, which is why I 
> suspect that there might be some stale communicator object around. Another 
> indication for that assumption is that you get stuck in the initialization 
> of the temporary vectors of the GMRES solver, which is exactly this kind of 
> situation.
>
> As to the particular patch I referred to: It does release some memory that 
> might have stale information but it also changes some of the call 
> structures slightly. Could you try to change the following:
>
> if (vector->Map().SameAs(v.vector->Map()) == false)
>
> to 
>
> if (v.vector->Map().SameAs(vector 
> ->Map())
>  
> == false)
>
> Best, Martin 
> On 16.03.2017 01:28, Pascal Kraft wrote: 
>
> Hi Martin,
> that didn't solve my problem. What I have done in the meantime is replace 
> the check in line 247 of trilinos_vector.cc with true. I don't know if this 
> causes memory leaks or anything but my code seems to be working fine with 
> that change. 
> To your suggestion: Would I have also had to call the templated version 
> for BlockVectors or only for Vectors? I only tried the latter. Would I have 
> had to also apply some patch to my dealii library for it to work or is the 
> patch you talked about simply that you included the functionality of the 
> call 
> GrowingVectorMemory::release_unused_memory() 
> in some places?
> I have also wanted to try MPICH instead of OpenMPI because of a post about 
> an internal error in OpenMPI and one of the functions appearing in the call 
> stacks sometimes not blocking properly.
> Thank you for your time and your fast responses - the whole library and 
> the people developing it and making it available are simply awesome ;)
> Pascal
> Am Mittwoch, 15. März 2017 17:26:23 UTC+1 schrieb Martin Kronbichler:
>>
>> Dear Pascal,
>>
>> This problem seems related to a problem we recently worked around in 
>> https://github.com/dealii/dealii/pull/4043
>>
>> Can you check what happens if you call 
>> GrowingVectorMemory::release_unused_memory()
>>
>> between your optimization steps? If a communicator gets stack in those 
>> places it is likely a stale object somewhere that we fail to work around 
>> for some reason.
>>
>> Best, Martin 
>> On 15.03.2017 14:10, Pascal Kraft wrote: 
>>
>> Dear Timo, 
>> I have done some more digging and found out the following. The problems 
>> seem to happen in trilinos_vector.cc between the lines 240 and 270.
>> What I see on the call stacks is, that one process reaches line 261 
>> ( ierr = vector->GlobalAssemble (last_action); ) and then waits inside this 
>> call at an MPI_Barrier with the following stack:
>> 20  7fffd4d18f56 
>> 19 opal_progress()  7fffdc56dfca 
>> 18 ompi_request_default_wait_all()  7fffddd54b15 
>> 17 ompi_coll_tuned_barrier_intra_recursivedoubling()  7fffcf9abb5d 
>> 16 PMPI_Barrier()  7fffddd68a9c 
>> 15 Epetra_MpiDistributor::DoPosts()  7fffe4088b4f 
>> 14 Epetra_MpiDistributor::Do()  7fffe4089773 
>> 13 Epetra_DistObject::DoTransfer()  7fffe400a96a 
>> 12 Epetra_DistObject::Export()  7fffe400b7b7 
>> 11 int Epetra_FEVector::GlobalAssemble()  7fffe4023d7f 
>> 10 Epetra_FEVector::GlobalAssemble()  7fffe40228e3 
>> The other (in my case three) processes are stuck in the head of the 
>> if/else-f statement leading up to this point, namely in the line 
>> if (vector->Map().SameAs(v.vector 
>> ->Map())
>>  
>> == false) 
>> inside the call to SameAs(...) with stacks like
>> 15 opal_progress() 7fffdc56dfbc 14 ompi_request_default_wait_all() 
>> 7fffddd54b15 13 ompi_coll_tuned_allreduce_intra_recursivedoubling() 
>> 7fffcf9a4913 12 PMPI_Allreduce() 7fffddd6587f 11 Epetra_MpiComm::MinAll() 
>> 7fffe408739e 10 Epetra_BlockMap::SameAs() 7fffe3fb9d74 
>> Maybe this helps. Producing a smaller example will likely not be possible 
>> in the coming two weeks but if there are no solutions until then I can try.
>> Greetings,
>> Pascal
>> -- The deal.II project is located at http://www.dealii.org/ For mailing 
>> list/forum options, see https://groups.google.com/

Re: [deal.II] Re: Internal instability of the GMRES Solver / Trilinos

2017-03-16 Thread Pascal Kraft
Dear Martin,

my local machine is dying to a Valgrind run at the moment, but as soon as 
that is done with one step I will put these changes in right away and post 
the results here (<6 hrs).
>From what I make of the call stacks on process somehow gets out of the 
SameAs() call without being MPI-blocked, and the others are then forced to 
wait during the All_Reduce call. How or where that happens I will try to 
figure out later today. SDM is now working well in my eclipse setup and I 
hope to be able to track the problem.

Best,
Pascal

Am Donnerstag, 16. März 2017 08:58:53 UTC+1 schrieb Martin Kronbichler:
>
> Dear Pascal,
>
> You are right, in your case one needs to call 
>
> GrowingVectorMemory::release_unused_memory()
> rather than for the vector. Can you try that as well?
>
> The problem appears to be that the call to SameAs returns different 
> results for different processors, which it should not be, which is why I 
> suspect that there might be some stale communicator object around. Another 
> indication for that assumption is that you get stuck in the initialization 
> of the temporary vectors of the GMRES solver, which is exactly this kind of 
> situation.
>
> As to the particular patch I referred to: It does release some memory that 
> might have stale information but it also changes some of the call 
> structures slightly. Could you try to change the following:
>
> if (vector->Map().SameAs(v.vector->Map()) == false)
>
> to 
>
> if (v.vector->Map().SameAs(vector 
> ->Map())
>  
> == false)
>
> Best, Martin 
> On 16.03.2017 01:28, Pascal Kraft wrote: 
>
> Hi Martin,
> that didn't solve my problem. What I have done in the meantime is replace 
> the check in line 247 of trilinos_vector.cc with true. I don't know if this 
> causes memory leaks or anything but my code seems to be working fine with 
> that change. 
> To your suggestion: Would I have also had to call the templated version 
> for BlockVectors or only for Vectors? I only tried the latter. Would I have 
> had to also apply some patch to my dealii library for it to work or is the 
> patch you talked about simply that you included the functionality of the 
> call 
> GrowingVectorMemory::release_unused_memory() 
> in some places?
> I have also wanted to try MPICH instead of OpenMPI because of a post about 
> an internal error in OpenMPI and one of the functions appearing in the call 
> stacks sometimes not blocking properly.
> Thank you for your time and your fast responses - the whole library and 
> the people developing it and making it available are simply awesome ;)
> Pascal
> Am Mittwoch, 15. März 2017 17:26:23 UTC+1 schrieb Martin Kronbichler:
>>
>> Dear Pascal,
>>
>> This problem seems related to a problem we recently worked around in 
>> https://github.com/dealii/dealii/pull/4043
>>
>> Can you check what happens if you call 
>> GrowingVectorMemory::release_unused_memory()
>>
>> between your optimization steps? If a communicator gets stack in those 
>> places it is likely a stale object somewhere that we fail to work around 
>> for some reason.
>>
>> Best, Martin 
>> On 15.03.2017 14:10, Pascal Kraft wrote: 
>>
>> Dear Timo, 
>> I have done some more digging and found out the following. The problems 
>> seem to happen in trilinos_vector.cc between the lines 240 and 270.
>> What I see on the call stacks is, that one process reaches line 261 
>> ( ierr = vector->GlobalAssemble (last_action); ) and then waits inside this 
>> call at an MPI_Barrier with the following stack:
>> 20  7fffd4d18f56 
>> 19 opal_progress()  7fffdc56dfca 
>> 18 ompi_request_default_wait_all()  7fffddd54b15 
>> 17 ompi_coll_tuned_barrier_intra_recursivedoubling()  7fffcf9abb5d 
>> 16 PMPI_Barrier()  7fffddd68a9c 
>> 15 Epetra_MpiDistributor::DoPosts()  7fffe4088b4f 
>> 14 Epetra_MpiDistributor::Do()  7fffe4089773 
>> 13 Epetra_DistObject::DoTransfer()  7fffe400a96a 
>> 12 Epetra_DistObject::Export()  7fffe400b7b7 
>> 11 int Epetra_FEVector::GlobalAssemble()  7fffe4023d7f 
>> 10 Epetra_FEVector::GlobalAssemble()  7fffe40228e3 
>> The other (in my case three) processes are stuck in the head of the 
>> if/else-f statement leading up to this point, namely in the line 
>> if (vector->Map().SameAs(v.vector 
>> ->Map())
>>  
>> == false) 
>> inside the call to SameAs(...) with stacks like
>> 15 opal_progress() 7fffdc56dfbc 14 ompi_request_default_wait_all() 
>> 7fffddd54b15 13 ompi_coll_tuned_allreduce_intra_recursivedoubling() 
>> 7fffcf9a4913 12 PMPI_Allreduce() 7fffddd6587f 11 Epetra_MpiComm::MinAll() 
>> 7fffe408739e 10 Epetra_BlockMap::SameAs() 7fffe3fb9d74 
>> Maybe this helps. Producing a smaller example will likely not be possible 
>> in the coming two weeks but if there are no solutions until then I can try.
>> Greetings,
>>

Re: [deal.II] Re: Internal instability of the GMRES Solver / Trilinos

2017-03-16 Thread Martin Kronbichler
Dear Pascal,

You are right, in your case one needs to call
GrowingVectorMemory::release_unused_memory()
rather than for the vector. Can you try that as well?

The problem appears to be that the call to SameAs returns different
results for different processors, which it should not be, which is why I
suspect that there might be some stale communicator object around.
Another indication for that assumption is that you get stuck in the
initialization of the temporary vectors of the GMRES solver, which is
exactly this kind of situation.

As to the particular patch I referred to: It does release some memory
that might have stale information but it also changes some of the call
structures slightly. Could you try to change the following:

if(vector->Map().SameAs(v.vector->Map()) == false)

to

if(v.vector->Map().SameAs(vector->Map())
== false)

Best, Martin

On 16.03.2017 01:28, Pascal Kraft wrote:
> Hi Martin,
> that didn't solve my problem. What I have done in the meantime is
> replace the check in line 247 of trilinos_vector.cc with true. I don't
> know if this causes memory leaks or anything but my code seems to be
> working fine with that change. 
> To your suggestion: Would I have also had to call the templated
> version for BlockVectors or only for Vectors? I only tried the latter.
> Would I have had to also apply some patch to my dealii library for it
> to work or is the patch you talked about simply that you included the
> functionality of the call
> GrowingVectorMemory::release_unused_memory()
> in some places?
> I have also wanted to try MPICH instead of OpenMPI because of a post
> about an internal error in OpenMPI and one of the functions appearing
> in the call stacks sometimes not blocking properly.
> Thank you for your time and your fast responses - the whole library
> and the people developing it and making it available are simply awesome ;)
> Pascal
> Am Mittwoch, 15. März 2017 17:26:23 UTC+1 schrieb Martin Kronbichler:
>
> Dear Pascal,
>
> This problem seems related to a problem we recently worked around
> in https://github.com/dealii/dealii/pull/4043
> 
>
> Can you check what happens if you call
> 
> GrowingVectorMemory::release_unused_memory()
>
> between your optimization steps? If a communicator gets stack in
> those places it is likely a stale object somewhere that we fail to
> work around for some reason.
>
> Best, Martin
>
> On 15.03.2017 14:10, Pascal Kraft wrote:
>> Dear Timo,
>> I have done some more digging and found out the following. The
>> problems seem to happen in trilinos_vector.cc between the lines
>> 240 and 270.
>> What I see on the call stacks is, that one process reaches line
>> 261 ( ierr = vector->GlobalAssemble (last_action); ) and then
>> waits inside this call at an MPI_Barrier with the following stack:
>> 20  7fffd4d18f56
>> 19 opal_progress()  7fffdc56dfca
>> 18 ompi_request_default_wait_all()  7fffddd54b15
>> 17 ompi_coll_tuned_barrier_intra_recursivedoubling()  7fffcf9abb5d
>> 16 PMPI_Barrier()  7fffddd68a9c
>> 15 Epetra_MpiDistributor::DoPosts()  7fffe4088b4f
>> 14 Epetra_MpiDistributor::Do()  7fffe4089773
>> 13 Epetra_DistObject::DoTransfer()  7fffe400a96a
>> 12 Epetra_DistObject::Export()  7fffe400b7b7
>> 11 int Epetra_FEVector::GlobalAssemble()  7fffe4023d7f
>> 10 Epetra_FEVector::GlobalAssemble()  7fffe40228e3
>> The other (in my case three) processes are stuck in the head of
>> the if/else-f statement leading up to this point, namely in the line 
>> if(vector->Map().SameAs(v.vector
>> 
>> ->Map())
>> == false)
>> inside the call to SameAs(...) with stacks like
>> 15 opal_progress() 7fffdc56dfbc 14
>> ompi_request_default_wait_all() 7fffddd54b15 13
>> ompi_coll_tuned_allreduce_intra_recursivedoubling() 7fffcf9a4913
>> 12 PMPI_Allreduce() 7fffddd6587f 11 Epetra_MpiComm::MinAll()
>> 7fffe408739e 10 Epetra_BlockMap::SameAs() 7fffe3fb9d74
>> Maybe this helps. Producing a smaller example will likely not be
>> possible in the coming two weeks but if there are no solutions
>> until then I can try.
>> Greetings,
>> Pascal
>> -- The deal.II project is located at http://www.dealii.org/ For
>> mailing list/forum options, see
>> https://groups.google.com/d/forum/dealii?hl=en
>>  --- You received
>> this message because you are subscribed to the Google Groups
>> "deal.II User Group" group. To unsubscribe from this group and
>> stop receiving emails from it, send an email to
>> dealii+un...@googlegroups.com . For more options,
>> visi

Re: [deal.II] Re: Internal instability of the GMRES Solver / Trilinos

2017-03-15 Thread Pascal Kraft
Hi Martin,

that didn't solve my problem. What I have done in the meantime is replace 
the check in line 247 of trilinos_vector.cc with true. I don't know if this 
causes memory leaks or anything but my code seems to be working fine with 
that change. 
To your suggestion: Would I have also had to call the templated version for 
BlockVectors or only for Vectors? I only tried the latter. Would I have had 
to also apply some patch to my dealii library for it to work or is the 
patch you talked about simply that you included the functionality of the 
call 
GrowingVectorMemory::release_unused_memory() 
in some places?
I have also wanted to try MPICH instead of OpenMPI because of a post about 
an internal error in OpenMPI and one of the functions appearing in the call 
stacks sometimes not blocking properly.

Thank you for your time and your fast responses - the whole library and the 
people developing it and making it available are simply awesome ;)

Pascal

Am Mittwoch, 15. März 2017 17:26:23 UTC+1 schrieb Martin Kronbichler:
>
> Dear Pascal,
>
> This problem seems related to a problem we recently worked around in 
> https://github.com/dealii/dealii/pull/4043
>
> Can you check what happens if you call 
> GrowingVectorMemory::release_unused_memory()
>
> between your optimization steps? If a communicator gets stack in those 
> places it is likely a stale object somewhere that we fail to work around 
> for some reason.
>
> Best,
> Martin
>
> On 15.03.2017 14:10, Pascal Kraft wrote:
>
> Dear Timo, 
>
> I have done some more digging and found out the following. The problems 
> seem to happen in trilinos_vector.cc between the lines 240 and 270.
> What I see on the call stacks is, that one process reaches line 261 ( ierr 
> = vector->GlobalAssemble (last_action); ) and then waits inside this call 
> at an MPI_Barrier with the following stack:
> 20  7fffd4d18f56 
> 19 opal_progress()  7fffdc56dfca 
> 18 ompi_request_default_wait_all()  7fffddd54b15 
> 17 ompi_coll_tuned_barrier_intra_recursivedoubling()  7fffcf9abb5d 
> 16 PMPI_Barrier()  7fffddd68a9c 
> 15 Epetra_MpiDistributor::DoPosts()  7fffe4088b4f 
> 14 Epetra_MpiDistributor::Do()  7fffe4089773 
> 13 Epetra_DistObject::DoTransfer()  7fffe400a96a 
> 12 Epetra_DistObject::Export()  7fffe400b7b7 
> 11 int Epetra_FEVector::GlobalAssemble()  7fffe4023d7f 
> 10 Epetra_FEVector::GlobalAssemble()  7fffe40228e3 
>
> The other (in my case three) processes are stuck in the head of the 
> if/else-f statement leading up to this point, namely in the line 
> if (vector->Map().SameAs(v.vector 
> ->Map())
>  
> == false)
> inside the call to SameAs(...) with stacks like
> 15 opal_progress() 7fffdc56dfbc 14 ompi_request_default_wait_all() 
> 7fffddd54b15 13 ompi_coll_tuned_allreduce_intra_recursivedoubling() 
> 7fffcf9a4913 12 PMPI_Allreduce() 7fffddd6587f 11 Epetra_MpiComm::MinAll() 
> 7fffe408739e 10 Epetra_BlockMap::SameAs() 7fffe3fb9d74 
> Maybe this helps. Producing a smaller example will likely not be possible 
> in the coming two weeks but if there are no solutions until then I can try.
> Greetings,
> Pascal
> -- 
> The deal.II project is located at http://www.dealii.org/
> For mailing list/forum options, see 
> https://groups.google.com/d/forum/dealii?hl=en
> --- 
> You received this message because you are subscribed to the Google Groups 
> "deal.II User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to dealii+un...@googlegroups.com .
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [deal.II] Re: Internal instability of the GMRES Solver / Trilinos

2017-03-15 Thread Martin Kronbichler

Dear Pascal,

This problem seems related to a problem we recently worked around in 
https://github.com/dealii/dealii/pull/4043


Can you check what happens if you call 
GrowingVectorMemory::release_unused_memory()


between your optimization steps? If a communicator gets stack in those 
places it is likely a stale object somewhere that we fail to work around 
for some reason.


Best,
Martin


On 15.03.2017 14:10, Pascal Kraft wrote:

Dear Timo,

I have done some more digging and found out the following. The 
problems seem to happen in trilinos_vector.cc between the lines 240 
and 270.
What I see on the call stacks is, that one process reaches line 261 
( ierr = vector->GlobalAssemble (last_action); ) and then waits inside 
this call at an MPI_Barrier with the following stack:

20  7fffd4d18f56
19 opal_progress()  7fffdc56dfca
18 ompi_request_default_wait_all()  7fffddd54b15
17 ompi_coll_tuned_barrier_intra_recursivedoubling()  7fffcf9abb5d
16 PMPI_Barrier()  7fffddd68a9c
15 Epetra_MpiDistributor::DoPosts()  7fffe4088b4f
14 Epetra_MpiDistributor::Do()  7fffe4089773
13 Epetra_DistObject::DoTransfer()  7fffe400a96a
12 Epetra_DistObject::Export()  7fffe400b7b7
11 int Epetra_FEVector::GlobalAssemble()  7fffe4023d7f
10 Epetra_FEVector::GlobalAssemble()  7fffe40228e3

The other (in my case three) processes are stuck in the head of the 
if/else-f statement leading up to this point, namely in the line
if(vector->Map().SameAs(v.vector 
->Map()) 
== false)

inside the call to SameAs(...) with stacks like
15 opal_progress() 7fffdc56dfbc 14 ompi_request_default_wait_all() 
7fffddd54b15 13 ompi_coll_tuned_allreduce_intra_recursivedoubling() 
7fffcf9a4913 12 PMPI_Allreduce() 7fffddd6587f 11 
Epetra_MpiComm::MinAll() 7fffe408739e 10 Epetra_BlockMap::SameAs() 
7fffe3fb9d74
Maybe this helps. Producing a smaller example will likely not be 
possible in the coming two weeks but if there are no solutions until 
then I can try.

Greetings,
Pascal
--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en

---
You received this message because you are subscribed to the Google 
Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to dealii+unsubscr...@googlegroups.com 
.

For more options, visit https://groups.google.com/d/optout.


--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[deal.II] Re: Internal instability of the GMRES Solver / Trilinos

2017-03-15 Thread Pascal Kraft
Dear Timo,

I have done some more digging and found out the following. The problems 
seem to happen in trilinos_vector.cc between the lines 240 and 270.
What I see on the call stacks is, that one process reaches line 261 ( ierr 
= vector->GlobalAssemble (last_action); ) and then waits inside this call 
at an MPI_Barrier with the following stack:
20  7fffd4d18f56 
19 opal_progress()  7fffdc56dfca 
18 ompi_request_default_wait_all()  7fffddd54b15 
17 ompi_coll_tuned_barrier_intra_recursivedoubling()  7fffcf9abb5d 
16 PMPI_Barrier()  7fffddd68a9c 
15 Epetra_MpiDistributor::DoPosts()  7fffe4088b4f 
14 Epetra_MpiDistributor::Do()  7fffe4089773 
13 Epetra_DistObject::DoTransfer()  7fffe400a96a 
12 Epetra_DistObject::Export()  7fffe400b7b7 
11 int Epetra_FEVector::GlobalAssemble()  7fffe4023d7f 
10 Epetra_FEVector::GlobalAssemble()  7fffe40228e3 

The other (in my case three) processes are stuck in the head of the 
if/else-f statement leading up to this point, namely in the line 
if (vector->Map().SameAs(v.vector 
->Map())
 
== false)
inside the call to SameAs(...) with stacks like

15 opal_progress() 7fffdc56dfbc 14 ompi_request_default_wait_all() 
7fffddd54b15 13 ompi_coll_tuned_allreduce_intra_recursivedoubling() 
7fffcf9a4913 12 PMPI_Allreduce() 7fffddd6587f 11 Epetra_MpiComm::MinAll() 
7fffe408739e 10 Epetra_BlockMap::SameAs() 7fffe3fb9d74 

Maybe this helps. Producing a smaller example will likely not be possible 
in the coming two weeks but if there are no solutions until then I can try.

Greetings,
Pascal

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [deal.II] Re: Internal instability of the GMRES Solver / Trilinos

2017-03-14 Thread Timo Heister
Pascal,

I have no idea why this is happening. I think you have to try to make
a minimal example that hangs so we can find out what the problem is. I
assume we incorrectly allocate/deallocate temporary vectors somewhere.

Are all processors stuck inside
> 9 dealii::TrilinosWrappers::MPI::Vector::reinit() trilinos_vector.cc:261 
> 752c937e
?


On Tue, Mar 14, 2017 at 2:43 PM, Pascal Kraft  wrote:
> By the way: After some time I see the additional function opal_progress() on
> top of the stack.
> Also here is what I use:
> gcc (GCC) 6.3.1 20170306
> openmpi 1.10.6-1
> trilinos-12.6.1
> dealii-8.4.1
> and my testcases consit of 4 MPI processes.
>
> Am Dienstag, 14. März 2017 19:29:09 UTC+1 schrieb Pascal Kraft:
>>
>> Dear list members,
>>
>> I am facing a really weird problem, that I have been struggling with for a
>> while now. I have written a problem class which, based on other objects,
>> generates a system matrix, rhs and solution vector object. The
>> datastructures are Trilinos Block distributed types. When I do this for the
>> first time it all  works perfectly. However the class is part of an
>> optimization scheme and usually at the second time the object is used
>> (randomly also later, but this has only happened once or twice) the Solver
>> does not start. I am checking with MPI-barriers to see if all processes
>> arrive at the GMRES::solve and they do but somehow not even my own
>> preconditioners vmult method gets called anymore. The objects (the two
>> vectors and the system matrix are exactly the same that they have been at
>> the previous step (only slightly different numbers, but same vectors of
>> IndexSets for the partition among processors)
>>
>> I have debugged this code-segment with Eclipse and the parallel debugger
>> but don't know what to do with the call stack:
>> 18 ompi_request_default_wait_all()  7fffddd54b15
>> 17 ompi_coll_tuned_barrier_intra_recursivedoubling()  7fffcf9abb5d
>> 16 PMPI_Barrier()  7fffddd68a9c
>> 15 Epetra_MpiDistributor::DoPosts()  7fffe4088b4f
>> 14 Epetra_MpiDistributor::Do()  7fffe4089773
>> 13 Epetra_DistObject::DoTransfer()  7fffe400a96a
>> 12 Epetra_DistObject::Export()  7fffe400b7b7
>> 11 int Epetra_FEVector::GlobalAssemble()  7fffe4023d7f
>> 10 Epetra_FEVector::GlobalAssemble()  7fffe40228e3
>> 9 dealii::TrilinosWrappers::MPI::Vector::reinit() trilinos_vector.cc:261
>> 752c937e
>> 8 dealii::TrilinosWrappers::MPI::BlockVector::reinit()
>> trilinos_block_vector.cc:191 74e43bd9
>> 7
>> dealii::internal::SolverGMRES::TmpVectors::operator()
>> solver_gmres.h:535 4a847d
>> 6
>> dealii::SolverGMRES::solve> PreconditionerSweeping>() solver_gmres.h:813 4d654a
>> 5 Waveguide::solve() Waveguide.cpp:1279 48f150
>>
>> The last line (5) here is a function I wrote which calls
>> SolverGMRES::solve with my
>> preconditioner (which works perfecly fine during the previous run. I found
>> some information online about MPI_Barrier being instable sometimes but I
>> don't know enough about the inner workings of Trilinos (Epetra) and Dealii
>> to make a judgment call here. If none can help I will try to provide a code
>> fragment but I doubt that will be possible (if it really is a racing
>> condition and I strip away the rather large ammout of code surrounding this
>> segment, it is unlikely to be reproducible.
>>
>> Originally I had used two MPI communicatorsthat were only different in the
>> numbering of the processes (one for the primal, one for the dual problem)
>> and created two independend objects of my problem class hich only used their
>> respective communicator. In that case, the solver had only worked whenever
>> the numbering of processes was either equal to that of MPI_COMM_WORLD or
>> exactly the opposite but not for say 1-2-3-4 -> 1-3-2-4 and gotten stuck in
>> the exact same way. I had thought it might be some internal use of
>> MPI_COMM_WORLD that was blocking somehow but it also happens now that I only
>> use one communicator (MPI_COMM_WORLD).
>>
>> Thank you in advance for your time,
>> Pascal Kraft
>
> --
> The deal.II project is located at 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.dealii.org_&d=DwIFaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=4k7iKXbjGC8LfYxVJJXiaYVu6FRWmEjX38S7JmlS9Vw&m=VOkcHXQVqmsabYZ_e85-hkeUX-pMTpVdQ9jMXMiWqXI&s=IXiRAQqQMkK6-Y-7bT60xzRYzX2hbkSrNCYSHJIr9jU&e=
>  
> For mailing list/forum options, see
> https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_d_forum_dealii-3Fhl-3Den&d=DwIFaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=4k7iKXbjGC8LfYxVJJXiaYVu6FRWmEjX38S7JmlS9Vw&m=VOkcHXQVqmsabYZ_e85-hkeUX-pMTpVdQ9jMXMiWqXI&s=MymJilrrRnEk7vLTTovuxOlaZwuAwq2hW-cqk-X0tLU&e=
>  
> ---
> You received this message because you are subscribed to the Google Groups
> "deal.II User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dealii+unsubscr...@googlegroups.com.
> For more options, visit 
> https://urldefense.proofpoint.com/v2/url?u=http

[deal.II] Re: Internal instability of the GMRES Solver / Trilinos

2017-03-14 Thread Pascal Kraft
By the way: After some time I see the additional function opal_progress() 
on top of the stack.
Also here is what I use:
gcc (GCC) 6.3.1 20170306
openmpi 1.10.6-1
trilinos-12.6.1
dealii-8.4.1
and my testcases consit of 4 MPI processes.

Am Dienstag, 14. März 2017 19:29:09 UTC+1 schrieb Pascal Kraft:
>
> Dear list members,
>
> I am facing a really weird problem, that I have been struggling with for a 
> while now. I have written a problem class which, based on other objects, 
> generates a system matrix, rhs and solution vector object. The 
> datastructures are Trilinos Block distributed types. When I do this for the 
> first time it all  works perfectly. However the class is part of an 
> optimization scheme and usually at the second time the object is used 
> (randomly also later, but this has only happened once or twice) the Solver 
> does not start. I am checking with MPI-barriers to see if all processes 
> arrive at the GMRES::solve and they do but somehow not even my own 
> preconditioners vmult method gets called anymore. The objects (the two 
> vectors and the system matrix are exactly the same that they have been at 
> the previous step (only slightly different numbers, but same vectors of 
> IndexSets for the partition among processors)
>
> I have debugged this code-segment with Eclipse and the parallel debugger 
> but don't know what to do with the call stack:
> 18 ompi_request_default_wait_all()  7fffddd54b15 
> 17 ompi_coll_tuned_barrier_intra_recursivedoubling()  7fffcf9abb5d 
> 16 PMPI_Barrier()  7fffddd68a9c 
> 15 Epetra_MpiDistributor::DoPosts()  7fffe4088b4f 
> 14 Epetra_MpiDistributor::Do()  7fffe4089773 
> 13 Epetra_DistObject::DoTransfer()  7fffe400a96a 
> 12 Epetra_DistObject::Export()  7fffe400b7b7 
> 11 int Epetra_FEVector::GlobalAssemble()  7fffe4023d7f 
> 10 Epetra_FEVector::GlobalAssemble()  7fffe40228e3 
> 9 dealii::TrilinosWrappers::MPI::Vector::reinit() trilinos_vector.cc:261 
> 752c937e 
> 8 dealii::TrilinosWrappers::MPI::BlockVector::reinit() 
> trilinos_block_vector.cc:191 74e43bd9 
> 7 
> dealii::internal::SolverGMRES::TmpVectors::operator()
>  
> solver_gmres.h:535 4a847d 
> 6 
> dealii::SolverGMRES::solve  
> PreconditionerSweeping>() solver_gmres.h:813 4d654a 
> 5 Waveguide::solve() Waveguide.cpp:1279 48f150 
>
> The last line (5) here is a function I wrote which calls 
> SolverGMRES::solve with my 
> preconditioner (which works perfecly fine during the previous run. I found 
> some information online about MPI_Barrier being instable sometimes but I 
> don't know enough about the inner workings of Trilinos (Epetra) and Dealii 
> to make a judgment call here. If none can help I will try to provide a code 
> fragment but I doubt that will be possible (if it really is a racing 
> condition and I strip away the rather large ammout of code surrounding this 
> segment, it is unlikely to be reproducible.
>
> Originally I had used two MPI communicatorsthat were only different in the 
> numbering of the processes (one for the primal, one for the dual problem) 
> and created two independend objects of my problem class hich only used 
> their respective communicator. In that case, the solver had only worked 
> whenever the numbering of processes was either equal to that of 
> MPI_COMM_WORLD or exactly the opposite but not for say 1-2-3-4 -> 1-3-2-4 
> and gotten stuck in the exact same way. I had thought it might be some 
> internal use of MPI_COMM_WORLD that was blocking somehow but it also 
> happens now that I only use one communicator (MPI_COMM_WORLD).
>
> Thank you in advance for your time,
> Pascal Kraft
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.