Bob,
   The OLTP workload is the only workload that uses system V semaphores. 
When the runtime expires, some of the processes are blocked on 
semaphores and do not "see" the quit indication. Unfortunately, they 
actually never see it, because the processes holding the semaphores do 
see it and exit. As far as I have been able to determine, there is no 
way for the main process to directly detect this. Instead, we have 
implemented a timeout system. It can take a few tens of seconds for the 
main process to figure out that child processes are hung and shoot them 
down. To do so any quicker risks other problems. Sometimes it does take 
a while, and I may need to tweak some of the timouts, but I have never 
seen it fail to eventually finish.

   If you waited several minutes and it still hadn't quit, then maybe 
there is still some sort of race I missed. Please let me know more about 
the system you were testing on (x86 vs sparc, for instance), as 
sometimes problems show up on one architecture rather than the other due 
to subtle endian dependencies that occasionally creep in. Also, my 
testing is on Nevada, though I don't see why Solaris 10 should be any 
different as far as this issue is concerned.

Drew


On 10/02/08 04:53, Bob Resendes wrote:
> I downloaded and installed the latest filebench code (1.3.4) on Solaris 10:
>
> bash-3.00# cat /etc/release
>                         Solaris 10 5/08 s10x_u5wos_10 X86
>            Copyright 2008 Sun Microsystems, Inc.  All Rights Reserved.
>                         Use is subject to license terms.
>                              Assembled 24 March 2008
>
> I copied then edited the "filemacro" profile to produce the following oltp 
> configuration:
>
> DEFAULTS {
>         runtime = 120;
>         dir = /files;
>         stats = /files/filebench/stats;
>         filesystem = ufs;
>         description = "OLTP ufs";
> }
>
> CONFIG large_db_oltp_8k_cached {
>         personality = oltp;
>         function = generic;
>         cached = 1;
>         directio = 0;
>         iosize = 8k;
>         nshadows = 200;
>         ndbwriters = 10;
>         usermode = 20000;
>         filesize = 1g;
>         memperthread = 1m;
>         workingset = 0;
>         logfilesize = 10m;
>         nfiles = 10;
>         nlogfiles = 1;
> }
>
> I then run the program with the following output:
>
> bash-3.00# /opt/filebench/bin/filebench my_oltp      
> parsing profile for config: large_db_oltp_8k_cached
> [...SNIP...]
> 29433: 125.630: Stats dump to file 'stats.large_db_oltp_8k_cached.out'
> 29433: 125.630: in statsdump stats.large_db_oltp_8k_cached.out
> 29433: 128.291: Shutting down processes
>
> The filebench command just seems to stuck at "Shutting down processes". 
>
> bash-3.00# ptree 29433
> 9352  /usr/lib/ssh/sshd
>   29386 /usr/lib/ssh/sshd
>     29389 /usr/lib/ssh/sshd
>       29391 -sh
>         29398 bash
>           29427 /usr/bin/perl /opt/filebench/bin/filebench my_oltp
>             29433 /opt/filebench/bin/go_filebench -f /files/filebench/
>
> Checking truss indicates that it's just waiting:
>
> bash-3.00# truss -p 29433
> /2:     nanosleep(0xC23FEF80, 0xC23FEF88)               = 0
> /1:     lwp_mutex_timedlock(0xC2400A30, 0x00000000) (sleeping...)
> /2:     nanosleep(0xC23FEF80, 0xC23FEF88)               = 0
> /2:     nanosleep(0xC23FEF80, 0xC23FEF88)               = 0
> [etc., etc, ...]
>
> Killing process 29433 seems to let filebench complete (i.e. generate 
> reports). I'm assuming there's a bug, but I don't know enough about filebench 
> to know where to start looking, yet. Hoping this is already a known issue. I 
> didn't see anything via a quick forum search.
> --
> This message posted from opensolaris.org
> _______________________________________________
> perf-discuss mailing list
> [email protected]
>   

_______________________________________________
perf-discuss mailing list
[email protected]

Reply via email to