Yes. The below is the only failure I saw in that run, but I see the same problem ("floating point exception PASS") elsewhere ...
http://www.open-mpi.org/mtt/index.php?do_redir=487 -Ethan On Wed, Jan/02/2008 08:50:19PM, Jeff Squyres wrote: > BTW, was this the only failure you saw? > > On Jan 2, 2008, at 3:55 PM, Ethan Mallove wrote: > >> I fixed the below issue with Analyze/Performance/IMB.pm in >> r1117. What it will now do is read an uninterrupted data >> table broken up by either an EOF or something that does not >> look like a row in the data table (e.g., an error or warning >> message). I'm surprised that the below "floating point >> exception" could result in a pass. At least now the entire >> test run is not scrapped because of one bad apple. >> >> -Ethan >> >> >>> Sat, Dec/29/2007 05:38:32PM, jjhur...@open-mpi.org wrote: >>> >>> SQL QUERY: INSERT INTO latency_bandwidth >>> (latency_bandwidth_id, message_size, latency_min, latency_avg, >>> latency_max, bandwidth_min, bandwidth_avg, bandwidth_max) VALUES >>> ('314123', >>> '{0,1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192,16384,32768,65536,131072,262144,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,524288,1048576,2097152,4194304}', >>> >>> '{0.15,191.51,166.73,169.26,166.44,167.43,168.30,168.44,165.46,166.42,167.48,162.31,136.42,222.19,446.03,716.19,1254.33,2458.68,5584.21,12544.87,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,27091.03,43622.23,72144.60,130192.28}', >>> >>> '{0.22,191.53,166.78,169.27,166.45,167.45,168.30,168.60,165.47,166.46,167.64,162.32,136.45,222.21,446.09,716.21,1254.39,2458.76,5584.51,12545.59,,,,,,,,,,,,,,,,,27094.99,43640.46,72183.70,130302.65}', >>> >>> '{0.28,191.55,166.89,169.28,166.47,167.49,168.32,168.69,165.48,166.49,167.70,162.34,136.48,222.24,446.13,716.24,1254.44,2458.84,5584.96,12546.58,,,,,,,,,,,,,,,,,27099.48,43659.70,72207.25,130419.42}', >>> >>> DEFAULT, DEFAULT, DEFAULT) >>> SQL ERROR: ERROR: malformed array literal: >>> "{0.22,191.53,166.78,169.27,166.45,167.45,168.30,168.60,165.47,166.46,167.64,162.32,136.45,222.21,446.09,716.21,1254.39,2458.76,5584.51,12545.59,,,,,,,,,,,,,,,,,27094.99,43640.46,72183.70,130302.65}" >>> SQL ERROR: >>> >>> [SNIP] >>> >>> 'exit_value_81' => 0, >>> 'mpi_install_section_name_81' => 'ompi/gnu-standard', >>> 'latency_max_81' => >>> '{0.28,191.55,166.89,169.28,166.47,167.49,168.32,168.69,165.48,166.49,167.70,162.34,136.48,222.24,446.13,716.24,1254.44,2458.84,5584.96,12546.58,,,,,,,,,,,,,,,,,27099.48,43659.70,72207.25,130419.42}', >>> 'latency_avg_81' => >>> '{0.22,191.53,166.78,169.27,166.45,167.45,168.30,168.60,165.47,166.46,167.64,162.32,136.45,222.21,446.09,716.21,1254.39,2458.76,5584.51,12545.59,,,,,,,,,,,,,,,,,27094.99,43640.46,72183.70,130302.65}', >>> 'np_81' => '8', >>> 'network_81' => 'loopback,verbs', >>> 'test_result_81' => 1, >>> 'latency_min_81' => >>> '{0.15,191.51,166.73,169.26,166.44,167.43,168.30,168.44,165.46,166.42,167.48,162.31,136.42,222.19,446.03,716.19,1254.33,2458.68,5584.21,12544.87,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,27091.03,43622.23,72144.60,130192.28}', >>> 'test_build_section_name_81' => 'imb', >>> 'description_81' => 'Cisco MPI development cluster', >>> 'result_stderr_81' => '', >>> 'environment_81' => '', >>> 'exit_signal_81' => -1, >>> 'test_name_81' => 'Allgatherv', >>> 'parameters_81' => '--mca btl_openib_use_eager_rdma 0 --mca >>> btl_tcp_if_include ib0 --mca oob_tcp_if_include ib0', >>> 'start_timestamp_81' => 'Sat Dec 29 22:32:56 2007', >>> 'command_81' => 'mpirun -np 8 --mca btl_openib_use_eager_rdma 0 --mca >>> btl openib,self --mca btl_tcp_if_include ib0 --mca oob_tcp_if_include ib0 >>> src/IMB-MPI1 -npmin 8 Allgatherv', >>> 'duration_81' => '20 seconds', >>> 'message_size_81' => >>> '{0,1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192,16384,32768,65536,131072,262144,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,524288,1048576,2097152,4194304}', >>> 'resource_manager_81' => 'slurm', >>> 'result_stdout_81' => >>> '#--------------------------------------------------- >>> # Intel (R) MPI Benchmark Suite V2.3, MPI-1 part >>> #--------------------------------------------------- >>> # Date : Sat Dec 29 14:32:57 2007 >>> # Machine : x86_64# System : Linux >>> # Release : 2.6.9-42.ELsmp >>> # Version : #1 SMP Wed Jul 12 23:32:02 EDT 2006 >>> >>> # >>> # Minimum message length in bytes: 0 >>> # Maximum message length in bytes: 4194304 >>> # >>> # MPI_Datatype : MPI_BYTE >>> # MPI_Datatype for reductions : MPI_FLOAT >>> # MPI_Op : MPI_SUM >>> # >>> # >>> >>> # List of Benchmarks to run: >>> >>> # Allgatherv >>> >>> #---------------------------------------------------------------- >>> # Benchmarking Allgatherv >>> # #processes = 8 >>> #---------------------------------------------------------------- >>> #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] >>> 0 1000 0.15 0.28 0.22 >>> 1 1000 191.51 191.55 191.53 >>> 2 1000 166.73 166.89 166.78 >>> 4 1000 169.26 169.28 169.27 >>> 8 1000 166.44 166.47 166.45 >>> 16 1000 167.43 167.49 167.45 >>> 32 1000 168.30 168.32 168.30 >>> 64 1000 168.44 168.69 168.60 >>> 128 1000 165.46 165.48 165.47 >>> 256 1000 166.42 166.49 166.46 >>> 512 1000 167.48 167.70 167.64 >>> 1024 1000 162.31 162.34 162.32 >>> 2048 1000 136.42 136.48 136.45 >>> 4096 1000 222.19 222.24 222.21 >>> 8192 1000 446.03 446.13 446.09 >>> 16384 1000 716.19 716.24 716.21 >>> 32768 1000 1254.33 1254.44 1254.39 >>> 65536 640 2458.68 2458.84 2458.76 >>> 131072 320 5584.21 5584.96 5584.51 >>> 262144 160 12544.87 12546.58 12545.59 >>> [svbu-mpi031:12247] *** Process received signal *** >>> [svbu-mpi031:12247] Signal: Floating point exception (8) >>> [svbu-mpi031:12247] Signal code: (0) >>> [svbu-mpi031:12247] Failing at address: 0x25900002fed >>> [svbu-mpi031:12247] [ 0] /lib64/tls/libpthread.so.0 [0x2a95e57430] >>> [svbu-mpi031:12247] [ 1] /lib64/tls/libc.so.6(__poll+0x2f) [0x2a9601e96f] >>> [svbu-mpi031:12247] [ 2] >>> /home/mpiteam/scratches/2007-12-28/Ppuo/installs/5BvS/install/lib/libopen-pal.so.0(opal_poll_dispatch+0x13c) >>> >>> [0x2a9568e1ba] >>> [svbu-mpi031:12247] [ 3] >>> /home/mpiteam/scratches/2007-12-28/Ppuo/installs/5BvS/install/lib/libopen-pal.so.0(opal_event_base_loop+0x419) >>> >>> [0x2a9568a238] >>> [svbu-mpi031:12247] [ 4] >>> /home/mpiteam/scratches/2007-12-28/Ppuo/installs/5BvS/install/lib/libopen-pal.so.0(opal_event_loop+0x1d) >>> >>> [0x2a95689e1d] >>> [svbu-mpi031:12247] [ 5] >>> /home/mpiteam/scratches/2007-12-28/Ppuo/installs/5BvS/install/lib/libopen-pal.so.0(opal_progress+0x6a) >>> >>> [0x2a95680fbe] >>> [svbu-mpi031:12247] [ 6] mpirun [0x403fe4] >>> [svbu-mpi031:12247] [ 7] mpirun(orterun+0x9bb) [0x403823] >>> [svbu-mpi031:12247] [ 8] mpirun(main+0x1b) [0x402e63] >>> [svbu-mpi031:12247] [ 9] /lib64/tls/libc.so.6(__libc_start_main+0xdb) >>> [0x2a95f7d3fb] >>> [svbu-mpi031:12247] [10] mpirun(orte_daemon_recv+0x1e2) [0x402dba] >>> [svbu-mpi031:12247] *** End of error message *** >>> 524288 80 27091.03 27099.48 27094.99 >>> 1048576 40 43622.23 43659.70 43640.46 >>> 2097152 20 72144.60 72207.25 72183.70 >>> 4194304 10 130192.28 130419.42 130302.65 >>> ', >>> 'variant_81' => 81, >>> 'result_message_81' => 'Passed', >>> 'test_type_81' => 'latency_bandwidth', >>> 'launcher_81' => 'mpirun', >>> > > > -- > Jeff Squyres > Cisco Systems