On Mon, 2008-05-05 at 16:43 +0800, Lin Feng Shen wrote:
> 
> Signed-off-by: Lin Feng Shen <[EMAIL PROTECTED]> 


Thanks Lin. Will wait for some comments on the list from somebody about
this Patch.

Regards--
Subrata
> 
> Thanks & Best regards, 
> 
> ----------
> Lin Feng Shen 沈林峰
> 
> Linux for System p Test, China Systems & Technology Lab
> China Development Labs, Beijing Tel: 86-10-82452244 Ext. 53535 Fax:
> 2312 Email: [EMAIL PROTECTED]
> Address: 5F, De Shi Building, No.9, Shangdi East Road, Haidian
> District, Beijing, P.R.China 100085 
> 
> 
> Subrata Modak
> <[EMAIL PROTECTED]> 
> 
> 05-05-08 下午 02:41 
>          Please respond to
>     [EMAIL PROTECTED]
> 
> 
> 
> 
>                To
> ltp-list
> <ltp-list@lists.sourceforge.net> 
>                cc
> Lin Feng
> Shen/China/[EMAIL PROTECTED], supriyak <[EMAIL PROTECTED]> 
>           Subject
> [PATCH] Arbitrary
> usleep time in
> LTP hugeshmctl01
> results in
>  incorrect
> execution order
> 
> 
> 
> 
> 
> 
> 
> 
> Hi all,
> 
> Please see a Problem description with hugeshmctl01 test case in LTP,
> and, the corresponding solution for that:
> 
> =================================================================
> Problem Description:Lin Feng Shen
> =================================================================
> I am testing hugetlb with ltp-full-20080430. Those cases under
> ${LTPROOT}/testcases/kernel/mem/hugetlb/ are executed one by one again
> and
> again. The test runs fine in the first a few hundreds of loops, but
> after
> hugeshmctl01 fails for the first time, some other cases fails a lot
> too.
> 
> ---------------- Here is the staf status -----------------
> $> /proc/sys/kernel # gss
> Hostname          : 
> Kernel            : 2.6.16.60-0.17-ppc64
> Kernel Build Date : Tue Apr 22 07:28:35 UTC 2008
> Distribution      : SUSE
>     --------
>     
> 
>          BASE Start Time:  Fri May 2 14:32:06 CDT 2008
>          Snapshot Time: Sun May  4 03:48:38 CDT 2008
>          --------
>          hugemmap01 (0)-local;944;7858;8802
>          hugemmap02 (0)-local;8802;0;8802
>          hugemmap03 (0)-local;8801;0;8802
>          hugemmap04 (0)-local;908;7893;8801
>          hugeshmat01 (0)-local;945;7857;8802
>          hugeshmat02 (0)-local;909;7893;8802
>          hugeshmat03 (0)-local;945;7857;8802
>          hugeshmctl01 (0)-local;943;7859;8802
>          hugeshmctl02 (0)-local;908;7894;8802
>          hugeshmctl03 (0)-local;944;7858;8802
>          hugeshmdt01 (0)-local;944;7858;8802
>          hugeshmget01 (0)-local;945;7857;8802
>          hugeshmget02 (0)-local;8802;0;8802
>          hugeshmget03 (0)-local;8802;0;8802
>          hugeshmget05 (0)-local;945;7857;8802
>                               --pass--fail--unused
> 
> ---------------- Here is the ltp log ----------------
> The first failure is hugeshmctl01.
> 
> hugeshmctl01    1  FAIL  :  # of attaches is incorrect - 3
> hugeshmctl01    2  PASS  :  pid, size, # of attaches and mode are
> correct - pass #2
> hugeshmctl01    3  PASS  :  new mode and change time are correct
> hugeshmctl01    4  PASS  :  shared memory appears to be removed
> 
> ------- Here is the meminfo -------
> before hugeshmctl01 fails:
> 
> clashlp1:~ # cat /proc/meminfo | tail -4
> HugePages_Total:    32
> HugePages_Free:     32
> HugePages_Rsvd:      0
> Hugepagesize:    16384 kB
> clashlp1:~ #
> 
> after hugeshmctl01 fails:
> 
> clashlp1:~ # cat /proc/meminfo | tail -4
> HugePages_Total:    32
> HugePages_Free:     30
> HugePages_Rsvd:     30
> Hugepagesize:    16384 kB
> clashlp1:~ #
> -------------------------------------
> 
> It seems that hugeshmctl01 doesn't free some hugetlb pages when it
> fails. ps
> shows that there is still an instance of hugeshmctl01 left even if
> hugeshmctl01
> is not running which may attach some hugetlb pages.
> -------------------------------------
> clashlp1:~ # ps ax  | grep huge
> 14166 pts/23   S+     0:00 grep huge
> 29360 ?        S      0:00 hugeshmctl01
> clashlp1:~ #
> -------------------------------------
> 
> The problem is due to the arbitrary usleep time in hugeshmctl01 which
> results in
> incorrect execution order. The intention of the sleep time is to
> ensure the
> children call shmat() and pause() before the parent checks shm status
> and calls
> stat_cleanup(). But there is no absolute assurance that this sleep
> always works.
> ------------
>    281         /* sleep briefly to ensure correct execution order */
>    282         usleep(250000);
> ------------
> 
> In the failure above, the last child process forked by the parent may
> not run
> and call shmat() immediately after it's created. When the parent
> checks shm
> status, it finds only 3 child attaching the shm instead of 4, so it
> reports the
> failure. And then it calls stat_cleanup() to send SIGUSR1 to all
> children, but
> since the last child hasn't called pause() yet, SIGUSR1 is handled
> before
> pause(). When the last child calls pause(), since there is no further
> signal to
> wake it up, it sleeps forever.
> =================================================================
> Patch: Lin Feng Shen
> =================================================================
> patch to ensure children can receive and handle SIGUSR1 from parent in
> pause()
> 
> The patch is not to change the arbitrary usleep time since any time is
> arbitrary though a large time is more acceptable. The patch is to use
> sigprocmask() to block SIGUSR1 before children sleep for SIGUSR1 from
> parent,
> and then call sigsuspend() to unblock SIGUSR1 and sleep for SIGUSR1.
> By doing
> so, we may avoid the infinite sleep and keeping attached shm forever
> so that
> affect other hugetlb test.
> 
> In parent process, aonther sigprocmask() is called before usleep().
> This has
> the same effect of sleep more time.
> 
> With this patch, I don't see the problem again.
> --------------------------
> Kernel                                    : 2.6.16.60-0.17-ppc64
> Kernel Build Date : Tue Apr 22 07:28:35 UTC 2008
> Distribution                   : SUSE
>     --------
>     
>                   BASE Start Time:  Sun May 4 20:26:11 CDT 2008
>                   Snapshot Time: Mon May  5 00:05:21 CDT 2008
>                   --------
>                   hugemmap01 (0)-local;803;0;80
>                   hugemmap02 (0)-local;803;0;80
>                   hugemmap03 (0)-local;803;0;80
>                   hugemmap04 (0)-local;803;0;80
>                   hugeshmat01 (0)-local;803;0;80
>                   hugeshmat02 (0)-local;803;0;80
>                   hugeshmat03 (0)-local;803;0;80
>                   hugeshmctl01 (0)-local;803;0;80
>                   hugeshmctl02 (0)-local;803;0;80
>                   hugeshmctl03 (0)-local;803;0;80
>                   hugeshmdt01 (0)-local;803;0;80
>                   hugeshmget01 (0)-local;803;0;80
>                   hugeshmget02 (0)-local;803;0;80
>                   hugeshmget03 (0)-local;803;0;80
>                   hugeshmget05 (0)-local;803;0;80
> =================================================================
> End Description & Solution
> =================================================================
> 
> Please review whether any one of you face the same problem and whether
> the patch solves your problem too.
> 
> Regards--
> Subrata
> [attachment "05_05_2008-([EMAIL PROTECTED])-hugeshmctl01.patch"
> deleted by Lin Feng Shen/China/IBM]  


-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Ltp-list mailing list
Ltp-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ltp-list

Reply via email to