Re: [MTT users] [MTT bugs] [MTT] #212: Generic network lockingserver

2009-11-18 Thread Ethan Mallove
On Sat, Nov/07/2009 04:15:42PM, Jeff Squyres wrote:
> On Nov 6, 2009, at 5:18 PM, Ethan Mallove wrote:
>
>> I'm running multiple MTT clients out of the same scratch directory
>> using SGE. I'm running into race conditions between the multiple
>> clients, where one client is overwriting another's data in the .dump
>> files - which is a Very Bad Thing(tm). I'm running the
>> client/mtt-lock-server, and I've added the corresponding [Lock]
>> section in my INI file. Will my MTT clients now not interfere with
>> each other's .dump files? I'm skeptical of this because I don't see,
>> e.g., Lock() calls in SaveRuns(). How do I make my .dump files safe?
>>
>
>
> Err... perhaps this part wasn't tested well...?
>
> I'm afraid it's been forever since I've looked at this code and I'm gearing 
> up to leave for the Forum on Tuesday and then staying on for SC09, so it's 
> quite likely that you'll be able to look at this in more detail before I 
> will.  Sorry to pass the buck; just trying to be realistic...  :-(

After some digging, I discover that MTT is not designed to execute
multiple INI sections out of a single scratch directory in parallel.
There's a ticket for this:

  https://svn.open-mpi.org/trac/mtt/ticket/167

The way around this limitation is to have MTT split up the .dump files
by INI section so that two MTT client running simultaneously never
conflict with each other. (This change did not need to be made for the
Test run .dump files, as MTT already splits them up.) I have attached
a patch, which makes a simple wrapper script for #167 possible. The
changes should not disrupt normal (non-parallel) execution. Anyone
care to give it a try?

-Ethan

>
> -- 
> Jeff Squyres
> jsquy...@cisco.com
>
> ___
> mtt-users mailing list
> mtt-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
diff -r 8760c5d19838 -r 8a8663cb0ac3 lib/MTT/MPI.pm
--- a/lib/MTT/MPI.pmMon Nov 09 14:38:09 2009 -0500
+++ b/lib/MTT/MPI.pmWed Nov 18 11:07:37 2009 -0500
@@ -16,6 +16,8 @@

 use strict;
 use MTT::Files;
+use MTT::Messages;
+use MTT::Util;

 #--

@@ -28,10 +30,13 @@
 #--

 # Filename where list of MPI sources is kept
-my $sources_data_filename = "mpi_sources.dump";
+my $sources_data_filename = "mpi_sources";

 # Filename where list of MPI installs is kept
-my $installs_data_filename = "mpi_installs.dump";
+my $installs_data_filename = "mpi_installs";
+
+# Filename extension for all the Dumper data files
+my $data_filename_extension = "dump";

 #--

@@ -42,10 +47,15 @@
 # Explicitly delete anything that was there
 $MTT::MPI::sources = undef;

-# If the file exists, read it in
-my $data;
-MTT::Files::load_dumpfile("$dir/$sources_data_filename", \$data);
-$MTT::MPI::sources = $data->{VAR1};
+my @dumpfiles = 
glob("$dir/$sources_data_filename-*.$data_filename_extension");
+foreach my $dumpfile (@dumpfiles) {
+
+# If the file exists, read it in
+my $data;
+MTT::Files::load_dumpfile($dumpfile, \$data);
+$MTT::MPI::sources = MTT::Util::merge_hashes($MTT::MPI::sources, 
$data->{VAR1});
+
+}

 # Rebuild the refcounts
 foreach my $get_key (keys(%{$MTT::MPI::sources})) {
@@ -62,9 +72,14 @@
 #--

 sub SaveSources {
-my ($dir) = @_;
+my ($dir, $name) = @_;

-MTT::Files::save_dumpfile("$dir/$sources_data_filename", 
+# We write the entire MPI::sources hash to file, even
+# though the filename indicates a single INI section
+# MTT::Util::hashes_merge will take care of duplicate
+# hash keys. The reason for splitting up the .dump files
+# is to keep them read and write safe across INI sections
+
MTT::Files::save_dumpfile("$dir/$sources_data_filename-$name.$data_filename_extension",
 
   $MTT::MPI::sources);
 }

@@ -76,10 +91,14 @@
 # Explicitly delete anything that was there
 $MTT::MPI::installs = undef;

-# If the file exists, read it in
-my $data;
-MTT::Files::load_dumpfile("$dir/$installs_data_filename", \$data);
-$MTT::MPI::installs = $data->{VAR1};
+my @dumpfiles = 
glob("$dir/$installs_data_filename-*.$data_filename_extension");
+foreach my $dumpfile (@dumpfiles) {
+
+# If the file exists, read it in
+my $data;
+MTT::Files::load_dumpfile($dumpfile, \$data);
+$MTT::MPI::installs = MTT::Util::merge_hashes($MTT::MPI::installs, 
$data->{VAR1});
+}

 # Rebuild the refcounts
 foreach my $get_key (keys(%{$MTT::MPI::installs})) {
@@ -106,9 +125,14 @@
 #--

 sub SaveInstalls {
-my ($dir) = @_;
+my ($dir, 

[MTT users] MTT trivial tests fails to complete on Centos5.3-x86_64 bit platform with OFED 1.5

2009-11-18 Thread Venkat Venkatsubra
 

 

From: Venkat Venkatsubra 
Sent: Wednesday, November 18, 2009 12:54 PM
To: 'mtt-us...@open-mpi.org'
Subject: MTT trivial tests fails to complete on Centos5.3-x86_64 bit
platform with OFED 1.5

 

Hello All,

 

How do I debug this problem ? Attached are the developer.ini and
trivial.ini files.

I can provide any other information that you need.

 

[root@samples]# cat /etc/issue

CentOS release 5.3 (Final)

Kernel \r on an \m

 

[root@samples]# uname -a

Linux 2.6.18-128.el5 #1 SMP Wed Jan 21 10:41:14 EST 2009 x86_64 x86_64
x86_64 GNU/Linux

 

I am running OFED-1.5-20091029-0617 daily build.

 

Started trivial tests using the following command:

 

[root@samples]# cat developer.ini trivial.ini | ../client/mtt --verbose
-





 >> Initializing reporter module: TextFile

 *** Reporter initialized

 *** MPI Get phase starting

 >> MPI Get: [mpi get: my installation]

Checking for new MPI sources...

Using MPI in: /usr/mpi/gcc/openmpi-1.3.2/

 *** WARNING: alreadyinstalled_mpi_type was not specified, defaulting to

 "OMPI".

Got new MPI sources: version 1.3.2

 *** MPI Get phase complete

 *** MPI Install phase starting

 >> MPI Install [mpi install: my installation]

Installing MPI: [my installation] / [1.3.2] / [my installation]...

 >> Reported to text file

 
/root/mtt-svn/samples/MPI_Install-my_installation-my_installation-1.3.2.
htm

   l

 >> Reported to text file

 
/root/mtt-svn/samples/MPI_Install-my_installation-my_installation-1.3.2.
txt

Completed MPI Install successfully

 *** MPI Install phase complete

 *** Test Get phase starting

 >> Test Get: [test get: trivial]

Checking for new test sources...

Got new test sources

 *** Test Get phase complete

 *** Test Build phase starting

 >> Test Build [test build: trivial]

Building for [my installation] / [1.3.2] / [my installation] /
[trivial]

 >> Reported to text file

   /root/mtt-svn/samples/Test_Build-trivial-my_installation-1.3.2.html

 >> Reported to text file

   /root/mtt-svn/samples/Test_Build-trivial-my_installation-1.3.2.txt

Completed test build successfully

 *** Test Build phase complete

 *** Test Run phase starting

 >> Test Run [trivial]

 >> Running with [my installation] / [1.3.2] / [my installation]

Using MPI Details [open mpi] with MPI Install [my installation]

  

During this stage the test stalls.

After about ~10 minutes the test gets killed.

dmesg on which the test is running displays the following output:

 

 ==

 Dmesg output

 ==

 Out of memory: Killed process 5346 (gdmgreeter).

 audispd invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0

 

 Call Trace:

  [] out_of_memory+0x8e/0x2f5

  [] __alloc_pages+0x245/0x2ce

  [] __do_page_cache_readahead+0x95/0x1d9

  [] sock_readv+0xb7/0xd1

  [] __wake_up_common+0x3e/0x68

  [] filemap_nopage+0x148/0x322

  [] __handle_mm_fault+0x1f8/0xe5c

  [] do_page_fault+0x4cb/0x830

  [] error_exit+0x0/0x84

 

Thanks!

 

Venkat



developer.ini
Description: developer.ini


trivial.ini
Description: trivial.ini


Re: [MTT users] MTT trivial tests fails to complete on Centos5.3-x86_64 bit platform with OFED 1.5

2009-11-18 Thread Ethan Mallove
Could you run with --debug (instead of --verbose), and send the
output.

Thanks,
Ethan

On Wed, Nov/18/2009 11:08:18AM, Venkat Venkatsubra wrote:
> 
> 
> 
> 
>From: Venkat Venkatsubra
>Sent: Wednesday, November 18, 2009 12:54 PM
>To: 'mtt-us...@open-mpi.org'
>Subject: MTT trivial tests fails to complete on Centos5.3-x86_64 bit
>platform with OFED 1.5
> 
> 
> 
>Hello All,
> 
> 
> 
>How do I debug this problem ? Attached are the developer.ini and
>trivial.ini files.
> 
>I can provide any other information that you need.
> 
> 
> 
>[root@samples]# cat /etc/issue
> 
>CentOS release 5.3 (Final)
> 
>Kernel \r on an \m
> 
> 
> 
>[root@samples]# uname -a
> 
>Linux 2.6.18-128.el5 #1 SMP Wed Jan 21 10:41:14 EST 2009 x86_64 x86_64
>x86_64 GNU/Linux
> 
> 
> 
>I am running OFED-1.5-20091029-0617 daily build.
> 
> 
> 
>Started trivial tests using the following command:
> 
> 
> 
>[root@samples]# cat developer.ini trivial.ini | ../client/mtt --verbose -
> 
>
> 
>
> 
> >> Initializing reporter module: TextFile
> 
> *** Reporter initialized
> 
> *** MPI Get phase starting
> 
> >> MPI Get: [mpi get: my installation]
> 
>Checking for new MPI sources...
> 
>Using MPI in: /usr/mpi/gcc/openmpi-1.3.2/
> 
> *** WARNING: alreadyinstalled_mpi_type was not specified, defaulting to
> 
> "OMPI".
> 
>Got new MPI sources: version 1.3.2
> 
> *** MPI Get phase complete
> 
> *** MPI Install phase starting
> 
> >> MPI Install [mpi install: my installation]
> 
>Installing MPI: [my installation] / [1.3.2] / [my installation]...
> 
> >> Reported to text file
> 
> 
>/root/mtt-svn/samples/MPI_Install-my_installation-my_installation-1.3.2.htm
> 
>   l
> 
> >> Reported to text file
> 
> 
>/root/mtt-svn/samples/MPI_Install-my_installation-my_installation-1.3.2.txt
> 
>Completed MPI Install successfully
> 
> *** MPI Install phase complete
> 
> *** Test Get phase starting
> 
> >> Test Get: [test get: trivial]
> 
>Checking for new test sources...
> 
>Got new test sources
> 
> *** Test Get phase complete
> 
> *** Test Build phase starting
> 
> >> Test Build [test build: trivial]
> 
>Building for [my installation] / [1.3.2] / [my installation] /
>[trivial]
> 
> >> Reported to text file
> 
>   /root/mtt-svn/samples/Test_Build-trivial-my_installation-1.3.2.html
> 
> >> Reported to text file
> 
>   /root/mtt-svn/samples/Test_Build-trivial-my_installation-1.3.2.txt
> 
>Completed test build successfully
> 
> *** Test Build phase complete
> 
> *** Test Run phase starting
> 
> >> Test Run [trivial]
> 
> >> Running with [my installation] / [1.3.2] / [my installation]
> 
>Using MPI Details [open mpi] with MPI Install [my installation]
> 
> 
> 
>During this stage the test stalls.
> 
>After about ~10 minutes the test gets killed.
> 
>dmesg on which the test is running displays the following output:
> 
> 
> 
> ==
> 
> Dmesg output
> 
> ==
> 
> Out of memory: Killed process 5346 (gdmgreeter).
> 
> audispd invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
> 
> 
> 
> Call Trace:
> 
>  [] out_of_memory+0x8e/0x2f5
> 
>  [] __alloc_pages+0x245/0x2ce
> 
>  [] __do_page_cache_readahead+0x95/0x1d9
> 
>  [] sock_readv+0xb7/0xd1
> 
>  [] __wake_up_common+0x3e/0x68
> 
>  [] filemap_nopage+0x148/0x322
> 
>  [] __handle_mm_fault+0x1f8/0xe5c
> 
>  [] do_page_fault+0x4cb/0x830
> 
>  [] error_exit+0x0/0x84
> 
> 
> 
>Thanks!
> 
> 
> 
>Venkat



> ___
> mtt-users mailing list
> mtt-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users



Re: [MTT users] MTT trivial tests fails to completeon Centos5.3-x86_64 bit platform with OFED 1.5

2009-11-18 Thread Venkat Venkatsubra
Attached.

-Original Message-
From: mtt-users-boun...@open-mpi.org
[mailto:mtt-users-boun...@open-mpi.org] On Behalf Of Ethan Mallove
Sent: Wednesday, November 18, 2009 1:41 PM
To: General user list for the MPI Testing Tool
Subject: Re: [MTT users] MTT trivial tests fails to completeon
Centos5.3-x86_64 bit platform with OFED 1.5

Could you run with --debug (instead of --verbose), and send the
output.

Thanks,
Ethan

On Wed, Nov/18/2009 11:08:18AM, Venkat Venkatsubra wrote:
> 
> 
> 
> 
>From: Venkat Venkatsubra
>Sent: Wednesday, November 18, 2009 12:54 PM
>To: 'mtt-us...@open-mpi.org'
>Subject: MTT trivial tests fails to complete on Centos5.3-x86_64
bit
>platform with OFED 1.5
> 
> 
> 
>Hello All,
> 
> 
> 
>How do I debug this problem ? Attached are the developer.ini and
>trivial.ini files.
> 
>I can provide any other information that you need.
> 
> 
> 
>[root@samples]# cat /etc/issue
> 
>CentOS release 5.3 (Final)
> 
>Kernel \r on an \m
> 
> 
> 
>[root@samples]# uname -a
> 
>Linux 2.6.18-128.el5 #1 SMP Wed Jan 21 10:41:14 EST 2009 x86_64
x86_64
>x86_64 GNU/Linux
> 
> 
> 
>I am running OFED-1.5-20091029-0617 daily build.
> 
> 
> 
>Started trivial tests using the following command:
> 
> 
> 
>[root@samples]# cat developer.ini trivial.ini | ../client/mtt
--verbose -
> 
>
> 
>
> 
> >> Initializing reporter module: TextFile
> 
> *** Reporter initialized
> 
> *** MPI Get phase starting
> 
> >> MPI Get: [mpi get: my installation]
> 
>Checking for new MPI sources...
> 
>Using MPI in: /usr/mpi/gcc/openmpi-1.3.2/
> 
> *** WARNING: alreadyinstalled_mpi_type was not specified,
defaulting to
> 
> "OMPI".
> 
>Got new MPI sources: version 1.3.2
> 
> *** MPI Get phase complete
> 
> *** MPI Install phase starting
> 
> >> MPI Install [mpi install: my installation]
> 
>Installing MPI: [my installation] / [1.3.2] / [my
installation]...
> 
> >> Reported to text file
> 
> 
>
/root/mtt-svn/samples/MPI_Install-my_installation-my_installation-1.3.2.
htm
> 
>   l
> 
> >> Reported to text file
> 
> 
>
/root/mtt-svn/samples/MPI_Install-my_installation-my_installation-1.3.2.
txt
> 
>Completed MPI Install successfully
> 
> *** MPI Install phase complete
> 
> *** Test Get phase starting
> 
> >> Test Get: [test get: trivial]
> 
>Checking for new test sources...
> 
>Got new test sources
> 
> *** Test Get phase complete
> 
> *** Test Build phase starting
> 
> >> Test Build [test build: trivial]
> 
>Building for [my installation] / [1.3.2] / [my installation] /
>[trivial]
> 
> >> Reported to text file
> 
>
/root/mtt-svn/samples/Test_Build-trivial-my_installation-1.3.2.html
> 
> >> Reported to text file
> 
>
/root/mtt-svn/samples/Test_Build-trivial-my_installation-1.3.2.txt
> 
>Completed test build successfully
> 
> *** Test Build phase complete
> 
> *** Test Run phase starting
> 
> >> Test Run [trivial]
> 
> >> Running with [my installation] / [1.3.2] / [my installation]
> 
>Using MPI Details [open mpi] with MPI Install [my installation]
> 
> 
> 
>During this stage the test stalls.
> 
>After about ~10 minutes the test gets killed.
> 
>dmesg on which the test is running displays the following output:
> 
> 
> 
> ==
> 
> Dmesg output
> 
> ==
> 
> Out of memory: Killed process 5346 (gdmgreeter).
> 
> audispd invoked oom-killer: gfp_mask=0x201d2, order=0,
oomkilladj=0
> 
> 
> 
> Call Trace:
> 
>  [] out_of_memory+0x8e/0x2f5
> 
>  [] __alloc_pages+0x245/0x2ce
> 
>  [] __do_page_cache_readahead+0x95/0x1d9
> 
>  [] sock_readv+0xb7/0xd1
> 
>  [] __wake_up_common+0x3e/0x68
> 
>  [] filemap_nopage+0x148/0x322
> 
>  [] __handle_mm_fault+0x1f8/0xe5c
> 
>  [] do_page_fault+0x4cb/0x830
> 
>  [] error_exit+0x0/0x84
> 
> 
> 
>Thanks!
> 
> 
> 
>Venkat



> ___
> mtt-users mailing list
> mtt-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users

___
mtt-users mailing list
mtt-us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
Debug is 1, Verbose is 1
*** MTT: ../client/mtt --debug -
*** Running on mughal
Chdir ../client
Chdir /root/mtt-svn/samples
Copying: stdin to /tmp/YrQTz71Lwq.ini
Expanding include_file(s) parameters in /tmp/YrQTz71Lwq.ini
Reading ini file: stdin
Validating INI inifile: /tmp/HJHhhN3BbC.ini
FilterINI: Final list of sections:
   [mtt]
   [mpi details: open mpi]
   [mpi get: my installation]
   [mpi install: my installation]
   [reporter: text file backup]
   [test get: trivial]
   [test build: trivial]
   [test run: trivial]
Value got: Config::IniFiles=HASH(0xfbb3540) MTT scratch
Value returning: 
scratch: .
sc