Your message dated Mon, 22 Apr 2013 10:17:42 +0000 with message-id <[email protected]> and subject line Bug#687266: fixed in aces3 3.0.6-7 has caused the Debian Bug report #687266, regarding aces3: some jobs hang when run sequentially to be marked as done.
This means that you claim that the problem has been dealt with. If this is not the case it is now your responsibility to reopen the Bug report if necessary, and/or fix the problem forthwith. (NB: If you are a system administrator and have no idea what this message is talking about, this may indicate a serious mail system misconfiguration somewhere. Please contact [email protected] immediately.) -- 687266: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=687266 Debian Bug Tracking System Contact [email protected] with problems
--- Begin Message ---Package: aces3 Version: 3.0.6-1 Severity: important When I run the same job as mention in #687264 with only one process, the job hangs in tran_rhf_ao_sv1.sio: Gather on company_rank succeeded. Static pre-defined array # 2 is first used on line 328 Allocated 800 bytes for static arrays. Allocated 896466560 bytes for blkmgr. Total memory usage 855 MBytes. Max. possible usage 900 MBytes Total blocks used = 65759 At this point, no more output is written and also no temporary files are written or updated, while the xaces3 process spins at 100% CPU. This is a representable backtrace: 0x000000000042e968 in one_pass_of_server () at sumz.c:439 439 MPI_Iprobe(MPI_ANY_SOURCE, readytag, newcomm, &flag, &status); (gdb) bt #0 0x000000000042e968 in one_pass_of_server () at sumz.c:439 #1 0x000000000042f73d in exec_thread_server_ (bflag=bflag@entry=0x729b20) at sumz.c:1248 #2 0x00000000004df2a4 in wait_on_block (array=23, block=1, blkndx=56362, type=201, request=4, instruction_timer=35, comm_timer=95) at wait_on_block.f:50 #3 0x000000000048b6a5 in compute_block (op=..., array_table=..., narray_table=198, index_table=..., nindex_table=32, block_map_table=..., nblock_map_table=55, segment_table=..., nsegment_table=43, scalar_table=..., nscalar_table=13, address_table=..., debugit=.FALSE., validate=.FALSE., flopcount=0, comm=3, comm_timer=95, instruction_timer=35) at compute_block.F:759 #4 0x00000000004d1e53 in optable_loop (optable=..., noptable=245, array_table=..., narray_table=198, array_labels=..., index_table=..., nindex_table=32, segment_table=..., nsegment_table=43, block_map_table=..., nblock_map_table=55, scalar_table=..., nscalar_table=13, proctab=..., address_table=..., debug=.FALSE., validate=.FALSE., comm=3, comm_timer=95, _array_labels=_array_labels@entry=10) at optable_loop.f:274 #5 0x00000000004423e5 in master.0.sip_fmain_init (__entry=1, ncompany_workers_min=<error reading variable: Cannot access memory at address 0x0>, ierr_return=<error reading variable: Cannot access memory at address 0x0>) at sip_fmain.F:582 #6 0x000000000042f8b8 in sumz_work_ (dryrun_flag=0x2, dryrun_flag@entry=0x7fff04aff4e8, fmbuffer=0xffff8002, fmbuffer@entry=0x23d0448c, dbg_flag=0x1, dbg_flag@entry=0x7fff04aff4e4, totalrecvbuffer=0x36c66) at sumz.c:1294 #7 0x0000000000423bea in worker_work () at worker_work.F:79 #8 0x000000000041a613 in aces3 () at beta.F:914 #9 0x000000000041959d in main (argc=<optimized out>, argv=<optimized out>) at beta.F:1014 #10 0x00007f0a0f6b4ead in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6 #11 0x00000000004195c9 in _start () Another: 0x00007f0a122f0f63 in PMPI_Iprobe () from /usr/lib/libmpi.so.0 (gdb) bt #0 0x00007f0a122f0f63 in PMPI_Iprobe () from /usr/lib/libmpi.so.0 #1 0x000000000042e9d0 in one_pass_of_server () at sumz.c:445 #2 0x000000000042f73d in exec_thread_server_ (bflag=bflag@entry=0x729b20) at sumz.c:1248 #3 0x00000000004df2a4 in wait_on_block (array=23, block=1, blkndx=56362, type=201, request=4, instruction_timer=35, comm_timer=95) at wait_on_block.f:50 #4 0x000000000048b6a5 in compute_block [...] at compute_block.F:759 And another: 0x00007f0a10c4e369 in opal_progress () from /usr/lib/libopen-pal.so.0 (gdb) bt #0 0x00007f0a10c4e369 in opal_progress () from /usr/lib/libopen-pal.so.0 #1 0x00007f0a122cd9c9 in ?? () from /usr/lib/libmpi.so.0 #2 0x00007f0a122f84e3 in PMPI_Test () from /usr/lib/libmpi.so.0 #3 0x00007f0a1110e122 in pmpi_test__ () from /usr/lib/libmpi_f77.so.0 #4 0x00000000004df2bd in wait_on_block (array=23, block=1, blkndx=56362, type=201, request=4, instruction_timer=35, comm_timer=95) at wait_on_block.f:48 #5 0x000000000048b6a5 in compute_block [...] at compute_block.F:759 I did not encounter any other backtraces after a few more tries. Michael
--- End Message ---
--- Begin Message ---Source: aces3 Source-Version: 3.0.6-7 We believe that the bug you reported is fixed in the latest version of aces3, which is due to be installed in the Debian FTP archive. A summary of the changes between this version and the previous one is attached. Thank you for reporting the bug, which will now be closed. If you have further comments please address them to [email protected], and the maintainer will reopen the bug report if appropriate. Debian distribution maintenance software pp. Michael Banck <[email protected]> (supplier of updated aces3 package) (This message was generated automatically at their request; if you believe that there is a problem with it please contact the archive administrators by mailing [email protected]) -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Format: 1.8 Date: Mon, 22 Apr 2013 10:44:02 +0200 Source: aces3 Binary: aces3 Architecture: source amd64 Version: 3.0.6-7 Distribution: unstable Urgency: high Maintainer: Michael Banck <[email protected]> Changed-By: Michael Banck <[email protected]> Description: aces3 - Advanced Concepts in Electronic Structure III Closes: 687264 687266 Changes: aces3 (3.0.6-7) unstable; urgency=high . [ Michael Banck ] * debian/patches/exit_impossible_seq_jobs_gracefully.patch: New patch, prints a helpful error message and quits if a job type is requested which cannot be run sequentially (Closes: #687266). * debian/patches/ignore_invalid_message_nind.patch: New patch, if a message of type ``server_barrier_signal'' or ``server_quit_msgtype'' arrives, ignore if the value of the ``nind'' field is invalid (Closes: #687264). Checksums-Sha1: e67c825dfea5f57e5b0dfe9d3ec0127966713aee 1317 aces3_3.0.6-7.dsc c1a678df89f7ab144e8405f1e59a9031e1c38906 10228 aces3_3.0.6-7.debian.tar.gz 890dd1ec585b1bcfefb7bf80c51c783978ac1dae 12616738 aces3_3.0.6-7_amd64.deb Checksums-Sha256: 1200dba65209a07e644829ad8a8568c1a20c7cee6b06aa62b5df3c5ba5658036 1317 aces3_3.0.6-7.dsc fc08b500f2feade518747a042f8543b711cbcf2aa5c95d4149337a5ec0f812da 10228 aces3_3.0.6-7.debian.tar.gz 75328de92a383f935729364fb3eafcef25e39a0993f565259dfdbd59445b4236 12616738 aces3_3.0.6-7_amd64.deb Files: 0ae44c4cd02a66dfbe160215bf5d27d8 1317 science optional aces3_3.0.6-7.dsc c5f8bab82e1821734b0b7698e80a810c 10228 science optional aces3_3.0.6-7.debian.tar.gz 9a51ae2a71f434db4841220fe17628f5 12616738 science optional aces3_3.0.6-7_amd64.deb -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAlF0/LYACgkQmHaJYZ7RAb/gvACglmlaHnS0JQ0cg0wzIVV0fB7c WIkAnRjI6TELx0qMnQ7oMpnwkat6kf49 =FcAu -----END PGP SIGNATURE-----
--- End Message ---

