Bug#888879: rheolef FTBFS on several architectures: test runs forever

2018-02-06 Thread Pierre Saramito
Hi Adrian and Andreas,

Ok, no poblems, I will check this issue.

True, the "make check" test suite was not run before in the
Debian enviromnent. 


This "make check" test suite was developped for managing the 
upstream version source code: it runs a
very long sequence of non-regression test suite.
It runs via "mpirun" and could have non-portability issues
on some architecture. Some possible patch could be to run the
test suite in sequential mode :
   make MPIRUN="" check

 Could you please decrease the severity of this bug, e.g. 
 to normal, as there is no obstacle to build the package.


I have only access to "amd64" achitectures in my lab:
 
 Please could you explain me how to access to access to
 the others machines/architectures used by the Debian team ?
 How to have a login on these machines ?

Thank you for your help maintaining the Debian package,

Pierre
--
pierre.saram...@imag.fr
Directeur de Recherche CNRS
Laboratoire Jean Kuntzmann, Grenoble, France
http://ljk.imag.fr/membres/Pierre.Saramito

- Mail original -
De: "Andreas Tille" <andr...@an3as.eu>
À: 888...@bugs.debian.org, "PIERRE SARAMITO" <pierre.saram...@imag.fr>
Envoyé: Mardi 6 Février 2018 11:59:19
Objet: Re: Bug#79: rheolef FTBFS on several architectures: test runs forever

Hi Pierre,

as I mentioned on Debian Science maintainers list[1] this problem does
not come unexpected.  Pierre, could you please confirm whether you keep
on maintaining this package and will check this issue?

>From my uneducated perspective the test failures always existed but the
test were simply not run before.  Thus a first course of action might be
to ask ftpmaster for removal of those architectures where rheolef does
not build and reduce severity of the bug to say "important".

Please note that I do not have any time resources to work on this
package.

Kind regards

  Andreas.

[1] 
https://lists.alioth.debian.org/pipermail/debian-science-maintainers/2018-January/057544.html

On Tue, Jan 30, 2018 at 10:24:18PM +0200, Adrian Bunk wrote:
> Source: rheolef
> Version: 6.7-5
> Severity: serious
> 
> https://buildd.debian.org/status/package.php?p=rheolef=sid
> 
> ...
>   mpirun -np 1 ./form_mass_bdr_tst -app P2 -weight yz -I my_cube_TP-5-v2 
> left >/dev/null 2>/dev/null
>   mpirun -np 2 ./form_mass_bdr_tst -app P2 -weight yz -I my_cube_TP-5-v2 
> left >/dev/null 2>/dev/null
>   mpirun -np 3 ./form_mass_bdr_tst -app P2 -weight yz -I my_cube_TP-5-v2 
> left >/dev/null 2>/dev/null
>   mpirun -np 1 ./form_mass_bdr_tst -app P2 -weight yz -I my_cube_TP-5-v2 
> right >/dev/null 2>/dev/null
>   mpirun -np 2 ./form_mass_bdr_tst -app P2 -weight yz -I my_cube_TP-5-v2 
> right >/dev/null 2>/dev/null
> E: Build killed with signal TERM after 150 minutes of inactivity
> 
> 
> I've reproduced this on i386, two processes are running
> forever (aborted after 6 hours on a fast CPU) with 100% CPU.
> 
> Backtraces:
> 
> Thread 3 (Thread 0xf50ffb40 (LWP 29032)):
> #0  0xf7ed6db9 in __kernel_vsyscall ()
> #1  0xf70fabd3 in __GI___poll (fds=0xf47005d0, nfds=2, timeout=360) at 
> ../sysdeps/unix/sysv/linux/poll.c:29
> #2  0xf5caed4a in poll (__timeout=360, __nfds=2, __fds=0xf47005d0) at 
> /usr/include/i386-linux-gnu/bits/poll2.h:46
> #3  poll_dispatch (base=0x578eb9c0, tv=0xf50f9bfc) at poll.c:165
> #4  0xf5ca59e9 in opal_libevent2022_event_base_loop (base=, 
> flags=) at event.c:1630
> #5  0xf5c6b3bd in progress_engine (obj=0x578eb950) at 
> runtime/opal_progress_threads.c:105
> #6  0xf5df6316 in start_thread (arg=0xf50ffb40) at pthread_create.c:465
> #7  0xf7105296 in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:108
> 
> Thread 2 (Thread 0xf5ac5b40 (LWP 29031)):
> #0  0xf7ed6db9 in __kernel_vsyscall ()
> #1  0xf71053fa in __GI_epoll_pwait (epfd=7, events=0x578ea930, maxevents=32, 
> timeout=-1, set=0x0)
> at ../sysdeps/unix/sysv/linux/epoll_pwait.c:42
> #2  0xf710569a in epoll_wait (epfd=7, events=0x578ea930, maxevents=32, 
> timeout=-1)
> at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
> #3  0xf5ca199a in epoll_dispatch (base=0x578ea7a0, tv=0x0) at epoll.c:407
> #4  0xf5ca59e9 in opal_libevent2022_event_base_loop (base=, 
> flags=) at event.c:1630
> #5  0xf5af23eb in progress_engine (obj=0x578ea7a0) at 
> src/util/progress_threads.c:52
> #6  0xf5df6316 in start_thread (arg=0xf5ac5b40) at pthread_create.c:465
> #7  0xf7105296 in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:108
> 
> Thread 1 (Thread 0xf5b4fe00 (LWP 29002)):
> #0  0xf7ed67f5 in ?? ()
> #1  0xf7ed6b43 in __vdso_clock_gettime ()
> #2  0xf7112961 in __GI___clock_gettime (clock_id=1, tp=0xffb74194) at 
> ../sysdeps/unix/clock_gettime.c:115
> #3  0xf5cc3297 

Bug#888879: rheolef FTBFS on several architectures: test runs forever

2018-02-06 Thread Andreas Tille
Hi Pierre,

as I mentioned on Debian Science maintainers list[1] this problem does
not come unexpected.  Pierre, could you please confirm whether you keep
on maintaining this package and will check this issue?

>From my uneducated perspective the test failures always existed but the
test were simply not run before.  Thus a first course of action might be
to ask ftpmaster for removal of those architectures where rheolef does
not build and reduce severity of the bug to say "important".

Please note that I do not have any time resources to work on this
package.

Kind regards

  Andreas.

[1] 
https://lists.alioth.debian.org/pipermail/debian-science-maintainers/2018-January/057544.html

On Tue, Jan 30, 2018 at 10:24:18PM +0200, Adrian Bunk wrote:
> Source: rheolef
> Version: 6.7-5
> Severity: serious
> 
> https://buildd.debian.org/status/package.php?p=rheolef=sid
> 
> ...
>   mpirun -np 1 ./form_mass_bdr_tst -app P2 -weight yz -I my_cube_TP-5-v2 
> left >/dev/null 2>/dev/null
>   mpirun -np 2 ./form_mass_bdr_tst -app P2 -weight yz -I my_cube_TP-5-v2 
> left >/dev/null 2>/dev/null
>   mpirun -np 3 ./form_mass_bdr_tst -app P2 -weight yz -I my_cube_TP-5-v2 
> left >/dev/null 2>/dev/null
>   mpirun -np 1 ./form_mass_bdr_tst -app P2 -weight yz -I my_cube_TP-5-v2 
> right >/dev/null 2>/dev/null
>   mpirun -np 2 ./form_mass_bdr_tst -app P2 -weight yz -I my_cube_TP-5-v2 
> right >/dev/null 2>/dev/null
> E: Build killed with signal TERM after 150 minutes of inactivity
> 
> 
> I've reproduced this on i386, two processes are running
> forever (aborted after 6 hours on a fast CPU) with 100% CPU.
> 
> Backtraces:
> 
> Thread 3 (Thread 0xf50ffb40 (LWP 29032)):
> #0  0xf7ed6db9 in __kernel_vsyscall ()
> #1  0xf70fabd3 in __GI___poll (fds=0xf47005d0, nfds=2, timeout=360) at 
> ../sysdeps/unix/sysv/linux/poll.c:29
> #2  0xf5caed4a in poll (__timeout=360, __nfds=2, __fds=0xf47005d0) at 
> /usr/include/i386-linux-gnu/bits/poll2.h:46
> #3  poll_dispatch (base=0x578eb9c0, tv=0xf50f9bfc) at poll.c:165
> #4  0xf5ca59e9 in opal_libevent2022_event_base_loop (base=, 
> flags=) at event.c:1630
> #5  0xf5c6b3bd in progress_engine (obj=0x578eb950) at 
> runtime/opal_progress_threads.c:105
> #6  0xf5df6316 in start_thread (arg=0xf50ffb40) at pthread_create.c:465
> #7  0xf7105296 in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:108
> 
> Thread 2 (Thread 0xf5ac5b40 (LWP 29031)):
> #0  0xf7ed6db9 in __kernel_vsyscall ()
> #1  0xf71053fa in __GI_epoll_pwait (epfd=7, events=0x578ea930, maxevents=32, 
> timeout=-1, set=0x0)
> at ../sysdeps/unix/sysv/linux/epoll_pwait.c:42
> #2  0xf710569a in epoll_wait (epfd=7, events=0x578ea930, maxevents=32, 
> timeout=-1)
> at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
> #3  0xf5ca199a in epoll_dispatch (base=0x578ea7a0, tv=0x0) at epoll.c:407
> #4  0xf5ca59e9 in opal_libevent2022_event_base_loop (base=, 
> flags=) at event.c:1630
> #5  0xf5af23eb in progress_engine (obj=0x578ea7a0) at 
> src/util/progress_threads.c:52
> #6  0xf5df6316 in start_thread (arg=0xf5ac5b40) at pthread_create.c:465
> #7  0xf7105296 in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:108
> 
> Thread 1 (Thread 0xf5b4fe00 (LWP 29002)):
> #0  0xf7ed67f5 in ?? ()
> #1  0xf7ed6b43 in __vdso_clock_gettime ()
> #2  0xf7112961 in __GI___clock_gettime (clock_id=1, tp=0xffb74194) at 
> ../sysdeps/unix/clock_gettime.c:115
> #3  0xf5cc3297 in opal_timer_linux_get_usec_clock_gettime () at 
> timer_linux_component.c:197
> #4  0xf5c669c3 in opal_progress () at runtime/opal_progress.c:197
> #5  0xf74b5e05 in sync_wait_st (sync=) at 
> ../opal/threads/wait_sync.h:80
> #6  ompi_request_default_wait_all (count=2, requests=0xffb742e4, 
> statuses=0x0) at request/req_wait.c:221
> #7  0xf750640d in ompi_coll_base_allreduce_intra_recursivedoubling 
> (sbuf=0x57951030, rbuf=0x57a9b400, count=2, 
> dtype=0xf7565140 , op=0xf7573e60 , 
> comm=0xf7569520 , 
> module=0x57976fa0) at base/coll_base_allreduce.c:225
> #8  0xe991f640 in ompi_coll_tuned_allreduce_intra_dec_fixed (sbuf=0x57951030, 
> rbuf=0x57a9b400, count=2, 
> dtype=0xf7565140 , op=0xf7573e60 , 
> comm=0xf7569520 , 
> module=0x57976fa0) at coll_tuned_decision_fixed.c:66
> #9  0xf74c5b77 in PMPI_Allreduce (sendbuf=0x57951030, recvbuf=0x57a9b400, 
> count=2, 
> datatype=0xf7565140 , op=0xf7573e60 , 
> comm=0xf7569520 )
> at pallreduce.c:107
> #10 0xf7b476cf in boost::mpi::detail::all_reduce_impl std::plus > (comm=..., 
> in_values=0x57951030, n=n@entry=2, out_values=0x57a9b400) at 
> /usr/include/boost/mpi/collectives/all_reduce.hpp:36
> #11 0xf7b58fc0 in boost::mpi::all_reduce int> > (out_values=, n=2, 
> in_values=, comm=..., op=...) at 
> /usr/include/boost/mpi/collectives/all_reduce.hpp:93
> #12 rheolef::mpi_assembly_begin > >, rheolef::disarray_rep rheolef::distributed, std::allocator >::message_type, 
> 

Bug#888879: rheolef FTBFS on several architectures: test runs forever

2018-01-30 Thread Adrian Bunk
Source: rheolef
Version: 6.7-5
Severity: serious

https://buildd.debian.org/status/package.php?p=rheolef=sid

...
  mpirun -np 1 ./form_mass_bdr_tst -app P2 -weight yz -I my_cube_TP-5-v2 
left >/dev/null 2>/dev/null
  mpirun -np 2 ./form_mass_bdr_tst -app P2 -weight yz -I my_cube_TP-5-v2 
left >/dev/null 2>/dev/null
  mpirun -np 3 ./form_mass_bdr_tst -app P2 -weight yz -I my_cube_TP-5-v2 
left >/dev/null 2>/dev/null
  mpirun -np 1 ./form_mass_bdr_tst -app P2 -weight yz -I my_cube_TP-5-v2 
right >/dev/null 2>/dev/null
  mpirun -np 2 ./form_mass_bdr_tst -app P2 -weight yz -I my_cube_TP-5-v2 
right >/dev/null 2>/dev/null
E: Build killed with signal TERM after 150 minutes of inactivity


I've reproduced this on i386, two processes are running
forever (aborted after 6 hours on a fast CPU) with 100% CPU.

Backtraces:

Thread 3 (Thread 0xf50ffb40 (LWP 29032)):
#0  0xf7ed6db9 in __kernel_vsyscall ()
#1  0xf70fabd3 in __GI___poll (fds=0xf47005d0, nfds=2, timeout=360) at 
../sysdeps/unix/sysv/linux/poll.c:29
#2  0xf5caed4a in poll (__timeout=360, __nfds=2, __fds=0xf47005d0) at 
/usr/include/i386-linux-gnu/bits/poll2.h:46
#3  poll_dispatch (base=0x578eb9c0, tv=0xf50f9bfc) at poll.c:165
#4  0xf5ca59e9 in opal_libevent2022_event_base_loop (base=, 
flags=) at event.c:1630
#5  0xf5c6b3bd in progress_engine (obj=0x578eb950) at 
runtime/opal_progress_threads.c:105
#6  0xf5df6316 in start_thread (arg=0xf50ffb40) at pthread_create.c:465
#7  0xf7105296 in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:108

Thread 2 (Thread 0xf5ac5b40 (LWP 29031)):
#0  0xf7ed6db9 in __kernel_vsyscall ()
#1  0xf71053fa in __GI_epoll_pwait (epfd=7, events=0x578ea930, maxevents=32, 
timeout=-1, set=0x0)
at ../sysdeps/unix/sysv/linux/epoll_pwait.c:42
#2  0xf710569a in epoll_wait (epfd=7, events=0x578ea930, maxevents=32, 
timeout=-1)
at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#3  0xf5ca199a in epoll_dispatch (base=0x578ea7a0, tv=0x0) at epoll.c:407
#4  0xf5ca59e9 in opal_libevent2022_event_base_loop (base=, 
flags=) at event.c:1630
#5  0xf5af23eb in progress_engine (obj=0x578ea7a0) at 
src/util/progress_threads.c:52
#6  0xf5df6316 in start_thread (arg=0xf5ac5b40) at pthread_create.c:465
#7  0xf7105296 in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:108

Thread 1 (Thread 0xf5b4fe00 (LWP 29002)):
#0  0xf7ed67f5 in ?? ()
#1  0xf7ed6b43 in __vdso_clock_gettime ()
#2  0xf7112961 in __GI___clock_gettime (clock_id=1, tp=0xffb74194) at 
../sysdeps/unix/clock_gettime.c:115
#3  0xf5cc3297 in opal_timer_linux_get_usec_clock_gettime () at 
timer_linux_component.c:197
#4  0xf5c669c3 in opal_progress () at runtime/opal_progress.c:197
#5  0xf74b5e05 in sync_wait_st (sync=) at 
../opal/threads/wait_sync.h:80
#6  ompi_request_default_wait_all (count=2, requests=0xffb742e4, statuses=0x0) 
at request/req_wait.c:221
#7  0xf750640d in ompi_coll_base_allreduce_intra_recursivedoubling 
(sbuf=0x57951030, rbuf=0x57a9b400, count=2, 
dtype=0xf7565140 , op=0xf7573e60 , 
comm=0xf7569520 , 
module=0x57976fa0) at base/coll_base_allreduce.c:225
#8  0xe991f640 in ompi_coll_tuned_allreduce_intra_dec_fixed (sbuf=0x57951030, 
rbuf=0x57a9b400, count=2, 
dtype=0xf7565140 , op=0xf7573e60 , 
comm=0xf7569520 , 
module=0x57976fa0) at coll_tuned_decision_fixed.c:66
#9  0xf74c5b77 in PMPI_Allreduce (sendbuf=0x57951030, recvbuf=0x57a9b400, 
count=2, 
datatype=0xf7565140 , op=0xf7573e60 , 
comm=0xf7569520 )
at pallreduce.c:107
#10 0xf7b476cf in boost::mpi::detail::all_reduce_impl > (comm=..., 
in_values=0x57951030, n=n@entry=2, out_values=0x57a9b400) at 
/usr/include/boost/mpi/collectives/all_reduce.hpp:36
#11 0xf7b58fc0 in boost::mpi::all_reduce 
> (out_values=, n=2, 
in_values=, comm=..., op=...) at 
/usr/include/boost/mpi/collectives/all_reduce.hpp:93
#12 rheolef::mpi_assembly_begin >, rheolef::disarray_rep::message_type, 
rheolef::apply_iterator, rheolef::first_op > > (stash=..., first_stash_idx=..., 
last_stash_idx=..., ownership=..., receive=..., send=...) at 
../../include/rheolef/mpi_assembly_begin.h:113
#13 0xf7b5a346 in rheolef::disarray_rep::dis_entry_assembly_begin 
(this=0x57acab70, my_set_op=...)
at ../../include/rheolef/disarray_mpi.icc:223
#14 rheolef::disarray::dis_entry_assembly_begin (this=) at ../../include/rheolef/disarray.h:592
#15 rheolef::disarray::dis_entry_assembly (this=) 
at ../../include/rheolef/disarray.h:594
#16 rheolef::geo_rep::set_element_side_index 
(this=, 
side_dim=) at geo_mpi_get.cc:461
#17 0xf7b5f25a in rheolef::geo_rep::get 
(this=,