Bob,
The OLTP workload is the only workload that uses system V semaphores.
When the runtime expires, some of the processes are blocked on
semaphores and do not "see" the quit indication. Unfortunately, they
actually never see it, because the processes holding the semaphores do
see it and exit. As far as I have been able to determine, there is no
way for the main process to directly detect this. Instead, we have
implemented a timeout system. It can take a few tens of seconds for the
main process to figure out that child processes are hung and shoot them
down. To do so any quicker risks other problems. Sometimes it does take
a while, and I may need to tweak some of the timouts, but I have never
seen it fail to eventually finish.
If you waited several minutes and it still hadn't quit, then maybe
there is still some sort of race I missed. Please let me know more about
the system you were testing on (x86 vs sparc, for instance), as
sometimes problems show up on one architecture rather than the other due
to subtle endian dependencies that occasionally creep in. Also, my
testing is on Nevada, though I don't see why Solaris 10 should be any
different as far as this issue is concerned.
Drew
On 10/02/08 04:53, Bob Resendes wrote:
> I downloaded and installed the latest filebench code (1.3.4) on Solaris 10:
>
> bash-3.00# cat /etc/release
> Solaris 10 5/08 s10x_u5wos_10 X86
> Copyright 2008 Sun Microsystems, Inc. All Rights Reserved.
> Use is subject to license terms.
> Assembled 24 March 2008
>
> I copied then edited the "filemacro" profile to produce the following oltp
> configuration:
>
> DEFAULTS {
> runtime = 120;
> dir = /files;
> stats = /files/filebench/stats;
> filesystem = ufs;
> description = "OLTP ufs";
> }
>
> CONFIG large_db_oltp_8k_cached {
> personality = oltp;
> function = generic;
> cached = 1;
> directio = 0;
> iosize = 8k;
> nshadows = 200;
> ndbwriters = 10;
> usermode = 20000;
> filesize = 1g;
> memperthread = 1m;
> workingset = 0;
> logfilesize = 10m;
> nfiles = 10;
> nlogfiles = 1;
> }
>
> I then run the program with the following output:
>
> bash-3.00# /opt/filebench/bin/filebench my_oltp
> parsing profile for config: large_db_oltp_8k_cached
> [...SNIP...]
> 29433: 125.630: Stats dump to file 'stats.large_db_oltp_8k_cached.out'
> 29433: 125.630: in statsdump stats.large_db_oltp_8k_cached.out
> 29433: 128.291: Shutting down processes
>
> The filebench command just seems to stuck at "Shutting down processes".
>
> bash-3.00# ptree 29433
> 9352 /usr/lib/ssh/sshd
> 29386 /usr/lib/ssh/sshd
> 29389 /usr/lib/ssh/sshd
> 29391 -sh
> 29398 bash
> 29427 /usr/bin/perl /opt/filebench/bin/filebench my_oltp
> 29433 /opt/filebench/bin/go_filebench -f /files/filebench/
>
> Checking truss indicates that it's just waiting:
>
> bash-3.00# truss -p 29433
> /2: nanosleep(0xC23FEF80, 0xC23FEF88) = 0
> /1: lwp_mutex_timedlock(0xC2400A30, 0x00000000) (sleeping...)
> /2: nanosleep(0xC23FEF80, 0xC23FEF88) = 0
> /2: nanosleep(0xC23FEF80, 0xC23FEF88) = 0
> [etc., etc, ...]
>
> Killing process 29433 seems to let filebench complete (i.e. generate
> reports). I'm assuming there's a bug, but I don't know enough about filebench
> to know where to start looking, yet. Hoping this is already a known issue. I
> didn't see anything via a quick forum search.
> --
> This message posted from opensolaris.org
> _______________________________________________
> perf-discuss mailing list
> [email protected]
>
_______________________________________________
perf-discuss mailing list
[email protected]