Re: sem_otime trashing
On Sat, 2013-06-01 at 21:02 +0200, Manfred Spraul wrote: > Hi Rik, > > I finally managed to get EFI boot, i.e. I'm now able to test on my i3 > (2core+HT). > > With semscale (i.e.: just overhead, perform semop=0 operations), the > scalability from 1 to 2 cores is good, but not linear: > # semscale 10 | grep "interleave 2" > > Cpus 1, interleave 2 delay 0: 35502103 in 10 secs > > Cpus 2, interleave 2 delay 0: 53990954 in 10 secs > --- > +53% when adding the 2nd core > (interleave 2 to force to use different cores) > > Did you consider moving sem_otime into the individual semaphores? > I did that (gross patch attached), and the performance is significantly > better: > > # semscale 10 | grep "interleave 2" > Cpus 1, interleave 2 delay 0: 35585634 in 10 secs > Cpus 2, interleave 2 delay 0: 70410230 in 10 secs > --- > +99% scalability when adding the 2nd core > > Unfortunately I won't be able to read my mails next week, but the effect > was too significant not to share it immediately. 64 core box. Previous numbers: vogelweide:/abuild/mike/:[0]# uname -r 3.8.13-rt9-rtm vogelweide:/abuild/mike/:[0]# ./semop-multi 256 64 cpus 64, threads: 256, semaphores: 64, test duration: 30 secs total operations: 33553800, ops/sec 1118460 New numbers: vogelweide:/abuild/mike/:[0]# !./semop-multi ./semop-multi 256 64 cpus 64, threads: 256, semaphores: 64, test duration: 30 secs total operations: 129474934, ops/sec 4315831 But, box rcu stalled on me. It's looking like the scalability patches are a bit racy rcu wise in an -rt kernel (oh dear). So, build as plain old PREEMPT again, eliminate -rt funnies. Previous numbers: vogelweide:/abuild/mike/:[0]# ./semop-multi 256 64 cpus 64, threads: 256, semaphores: 64, test duration: 30 secs total operations: 22053968, ops/sec 735132 vogelweide:/abuild/mike/:[0]# ./osim 64 256 100 0 0 osim osim: using a semaphore array with 64 semaphores. osim: using 256 tasks. osim: each thread loops 3907 times osim: each thread busyloops 0 loops outside and 0 loops inside. total execution time: 1.858765 seconds for 1000192 loops per loop execution time: 1.858 usec New numbers: vogelweide:/abuild/mike/:[0]# !./semop ./semop-multi 256 64 cpus 64, threads: 256, semaphores: 64, test duration: 30 secs total operations: 45521478, ops/sec 1517382 vogelweide:/abuild/mike/:[0]# !./osim ./osim 64 256 100 0 0 osim osim: using a semaphore array with 64 semaphores. osim: using 256 tasks. osim: each thread loops 3907 times osim: each thread busyloops 0 loops outside and 0 loops inside. total execution time: 0.350682 seconds for 1000192 loops per loop execution time: 0.350 usec (1.8->0.3?.. box, you ain't a race horse, you're a plow horse) vogelweide:/abuild/mike/:[0]# ./osim 64 256 100 0 0 osim osim: using a semaphore array with 64 semaphores. osim: using 256 tasks. osim: each thread loops 3907 times osim: each thread busyloops 0 loops outside and 0 loops inside. total execution time: 0.276405 seconds for 1000192 loops per loop execution time: 0.276 usec vogelweide:/abuild/mike/:[0]# ./osim 64 256 100 0 0 osim osim: using a semaphore array with 64 semaphores. osim: using 256 tasks. osim: each thread loops 3907 times osim: each thread busyloops 0 loops outside and 0 loops inside. total execution time: 0.370041 seconds for 1000192 loops per loop execution time: 0.369 usec vogelweide:/abuild/mike/:[0]# ./osim 64 256 100 0 0 osim osim: using a semaphore array with 64 semaphores. osim: using 256 tasks. osim: each thread loops 3907 times osim: each thread busyloops 0 loops outside and 0 loops inside. total execution time: 0.502396 seconds for 1000192 loops per loop execution time: 0.502 usec (runtime) vogelweide:/abuild/mike/:[0]# ./osim 64 256 1000 0 0 osim osim: using a semaphore array with 64 semaphores. osim: using 256 tasks. osim: each thread loops 39063 times osim: each thread busyloops 0 loops outside and 0 loops inside. total execution time: 3.354423 seconds for 1128 loops per loop execution time: 0.335 usec vogelweide:/abuild/mike/:[0]# ./osim 64 256 1 0 0 osim osim: using a semaphore array with 64 semaphores. osim: using 256 tasks. osim: each thread loops 390625 times osim: each thread busyloops 0 loops outside and 0 loops inside. total execution time: 41.180479 seconds for 1 loops per loop execution time: 0.411 usec Box likes your idea. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
sem_otime trashing
Hi Rik, I finally managed to get EFI boot, i.e. I'm now able to test on my i3 (2core+HT). With semscale (i.e.: just overhead, perform semop=0 operations), the scalability from 1 to 2 cores is good, but not linear: # semscale 10 | grep "interleave 2" Cpus 1, interleave 2 delay 0: 35502103 in 10 secs Cpus 2, interleave 2 delay 0: 53990954 in 10 secs --- +53% when adding the 2nd core (interleave 2 to force to use different cores) Did you consider moving sem_otime into the individual semaphores? I did that (gross patch attached), and the performance is significantly better: # semscale 10 | grep "interleave 2" Cpus 1, interleave 2 delay 0: 35585634 in 10 secs Cpus 2, interleave 2 delay 0: 70410230 in 10 secs --- +99% scalability when adding the 2nd core Unfortunately I won't be able to read my mails next week, but the effect was too significant not to share it immediately. -- Manfred diff --git a/Makefile b/Makefile index 73e20db..42137ab 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ VERSION = 3 PATCHLEVEL = 10 SUBLEVEL = 0 -EXTRAVERSION = -rc3 +EXTRAVERSION = -rc3-otime NAME = Unicycling Gorilla # *DOCUMENTATION* diff --git a/include/linux/sem.h b/include/linux/sem.h index 55e17f6..976ce3a 100644 --- a/include/linux/sem.h +++ b/include/linux/sem.h @@ -12,7 +12,6 @@ struct task_struct; struct sem_array { struct kern_ipc_permcacheline_aligned_in_smp sem_perm; /* permissions .. see ipc.h */ - time_t sem_otime; /* last semop time */ time_t sem_ctime; /* last change time */ struct sem *sem_base; /* ptr to first semaphore in array */ struct list_headpending_alter; /* pending operations */ diff --git a/ipc/sem.c b/ipc/sem.c index 1dbb2fa..e5f000f 100644 --- a/ipc/sem.c +++ b/ipc/sem.c @@ -92,6 +92,7 @@ /* One semaphore structure for each semaphore in the system. */ struct sem { + charfiller[64]; int semval; /* current value */ int sempid; /* pid of last operation */ spinlock_t lock; /* spinlock for fine-grained semtimedop */ @@ -99,7 +100,8 @@ struct sem { /* that alter the semaphore */ struct list_head pending_const; /* pending single-sop operations */ /* that do not alter the semaphore*/ -}; + time_t sem_otime; /* candidate for sem_otime */ +} cacheline_aligned_in_smp; /* One queue for each sleeping process in the system. */ struct sem_queue { @@ -919,8 +921,14 @@ static void do_smart_update(struct sem_array *sma, struct sembuf *sops, int nsop } } } - if (otime) - sma->sem_otime = get_seconds(); + if (otime) { + if (sops == NULL) { + sma->sem_base[0].sem_otime = get_seconds(); + } else { + sma->sem_base[sops[0].sem_num].sem_otime = + get_seconds(); + } + } } @@ -1066,6 +1074,21 @@ static unsigned long copy_semid_to_user(void __user *buf, struct semid64_ds *in, } } +static time_t get_semotime(struct sem_array *sma) +{ + int i; + time_t res; + + res = sma->sem_base[0].sem_otime; + for (i = 1; i < sma->sem_nsems; i++) { + time_t to = sma->sem_base[i].sem_otime; + + if (to > res) + res = to; + } + return res; +} + static int semctl_nolock(struct ipc_namespace *ns, int semid, int cmd, int version, void __user *p) { @@ -1139,9 +1162,9 @@ static int semctl_nolock(struct ipc_namespace *ns, int semid, goto out_unlock; kernel_to_ipc64_perm(>sem_perm, _perm); - tbuf.sem_otime = sma->sem_otime; - tbuf.sem_ctime = sma->sem_ctime; - tbuf.sem_nsems = sma->sem_nsems; + tbuf.sem_otime = get_semotime(sma); + tbuf.sem_ctime = sma->sem_ctime; + tbuf.sem_nsems = sma->sem_nsems; rcu_read_unlock(); if (copy_semid_to_user(p, , version)) return -EFAULT; @@ -2029,6 +2052,9 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it) { struct user_namespace *user_ns = seq_user_ns(s); struct sem_array *sma = it; + time_t sem_otime; + + sem_otime = get_semotime(sma); return seq_printf(s, "%10d %10d %4o %10u %5u %5u %5u %5u %10lu %10lu\n", @@ -2040,7 +2066,7 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it) from_kgid_munged(user_ns, sma->sem_perm.gid), from_kuid_munged(user_ns, sma->sem_perm.cuid),
sem_otime trashing
Hi Rik, I finally managed to get EFI boot, i.e. I'm now able to test on my i3 (2core+HT). With semscale (i.e.: just overhead, perform semop=0 operations), the scalability from 1 to 2 cores is good, but not linear: # semscale 10 | grep interleave 2 Cpus 1, interleave 2 delay 0: 35502103 in 10 secs Cpus 2, interleave 2 delay 0: 53990954 in 10 secs --- +53% when adding the 2nd core (interleave 2 to force to use different cores) Did you consider moving sem_otime into the individual semaphores? I did that (gross patch attached), and the performance is significantly better: # semscale 10 | grep interleave 2 Cpus 1, interleave 2 delay 0: 35585634 in 10 secs Cpus 2, interleave 2 delay 0: 70410230 in 10 secs --- +99% scalability when adding the 2nd core Unfortunately I won't be able to read my mails next week, but the effect was too significant not to share it immediately. -- Manfred diff --git a/Makefile b/Makefile index 73e20db..42137ab 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ VERSION = 3 PATCHLEVEL = 10 SUBLEVEL = 0 -EXTRAVERSION = -rc3 +EXTRAVERSION = -rc3-otime NAME = Unicycling Gorilla # *DOCUMENTATION* diff --git a/include/linux/sem.h b/include/linux/sem.h index 55e17f6..976ce3a 100644 --- a/include/linux/sem.h +++ b/include/linux/sem.h @@ -12,7 +12,6 @@ struct task_struct; struct sem_array { struct kern_ipc_permcacheline_aligned_in_smp sem_perm; /* permissions .. see ipc.h */ - time_t sem_otime; /* last semop time */ time_t sem_ctime; /* last change time */ struct sem *sem_base; /* ptr to first semaphore in array */ struct list_headpending_alter; /* pending operations */ diff --git a/ipc/sem.c b/ipc/sem.c index 1dbb2fa..e5f000f 100644 --- a/ipc/sem.c +++ b/ipc/sem.c @@ -92,6 +92,7 @@ /* One semaphore structure for each semaphore in the system. */ struct sem { + charfiller[64]; int semval; /* current value */ int sempid; /* pid of last operation */ spinlock_t lock; /* spinlock for fine-grained semtimedop */ @@ -99,7 +100,8 @@ struct sem { /* that alter the semaphore */ struct list_head pending_const; /* pending single-sop operations */ /* that do not alter the semaphore*/ -}; + time_t sem_otime; /* candidate for sem_otime */ +} cacheline_aligned_in_smp; /* One queue for each sleeping process in the system. */ struct sem_queue { @@ -919,8 +921,14 @@ static void do_smart_update(struct sem_array *sma, struct sembuf *sops, int nsop } } } - if (otime) - sma-sem_otime = get_seconds(); + if (otime) { + if (sops == NULL) { + sma-sem_base[0].sem_otime = get_seconds(); + } else { + sma-sem_base[sops[0].sem_num].sem_otime = + get_seconds(); + } + } } @@ -1066,6 +1074,21 @@ static unsigned long copy_semid_to_user(void __user *buf, struct semid64_ds *in, } } +static time_t get_semotime(struct sem_array *sma) +{ + int i; + time_t res; + + res = sma-sem_base[0].sem_otime; + for (i = 1; i sma-sem_nsems; i++) { + time_t to = sma-sem_base[i].sem_otime; + + if (to res) + res = to; + } + return res; +} + static int semctl_nolock(struct ipc_namespace *ns, int semid, int cmd, int version, void __user *p) { @@ -1139,9 +1162,9 @@ static int semctl_nolock(struct ipc_namespace *ns, int semid, goto out_unlock; kernel_to_ipc64_perm(sma-sem_perm, tbuf.sem_perm); - tbuf.sem_otime = sma-sem_otime; - tbuf.sem_ctime = sma-sem_ctime; - tbuf.sem_nsems = sma-sem_nsems; + tbuf.sem_otime = get_semotime(sma); + tbuf.sem_ctime = sma-sem_ctime; + tbuf.sem_nsems = sma-sem_nsems; rcu_read_unlock(); if (copy_semid_to_user(p, tbuf, version)) return -EFAULT; @@ -2029,6 +2052,9 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it) { struct user_namespace *user_ns = seq_user_ns(s); struct sem_array *sma = it; + time_t sem_otime; + + sem_otime = get_semotime(sma); return seq_printf(s, %10d %10d %4o %10u %5u %5u %5u %5u %10lu %10lu\n, @@ -2040,7 +2066,7 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it) from_kgid_munged(user_ns, sma-sem_perm.gid), from_kuid_munged(user_ns, sma-sem_perm.cuid),
Re: sem_otime trashing
On Sat, 2013-06-01 at 21:02 +0200, Manfred Spraul wrote: Hi Rik, I finally managed to get EFI boot, i.e. I'm now able to test on my i3 (2core+HT). With semscale (i.e.: just overhead, perform semop=0 operations), the scalability from 1 to 2 cores is good, but not linear: # semscale 10 | grep interleave 2 Cpus 1, interleave 2 delay 0: 35502103 in 10 secs Cpus 2, interleave 2 delay 0: 53990954 in 10 secs --- +53% when adding the 2nd core (interleave 2 to force to use different cores) Did you consider moving sem_otime into the individual semaphores? I did that (gross patch attached), and the performance is significantly better: # semscale 10 | grep interleave 2 Cpus 1, interleave 2 delay 0: 35585634 in 10 secs Cpus 2, interleave 2 delay 0: 70410230 in 10 secs --- +99% scalability when adding the 2nd core Unfortunately I won't be able to read my mails next week, but the effect was too significant not to share it immediately. 64 core box. Previous numbers: vogelweide:/abuild/mike/:[0]# uname -r 3.8.13-rt9-rtm vogelweide:/abuild/mike/:[0]# ./semop-multi 256 64 cpus 64, threads: 256, semaphores: 64, test duration: 30 secs total operations: 33553800, ops/sec 1118460 New numbers: vogelweide:/abuild/mike/:[0]# !./semop-multi ./semop-multi 256 64 cpus 64, threads: 256, semaphores: 64, test duration: 30 secs total operations: 129474934, ops/sec 4315831 But, box rcu stalled on me. It's looking like the scalability patches are a bit racy rcu wise in an -rt kernel (oh dear). So, build as plain old PREEMPT again, eliminate -rt funnies. Previous numbers: vogelweide:/abuild/mike/:[0]# ./semop-multi 256 64 cpus 64, threads: 256, semaphores: 64, test duration: 30 secs total operations: 22053968, ops/sec 735132 vogelweide:/abuild/mike/:[0]# ./osim 64 256 100 0 0 osim sems tasks loops busy-in busy-out osim: using a semaphore array with 64 semaphores. osim: using 256 tasks. osim: each thread loops 3907 times osim: each thread busyloops 0 loops outside and 0 loops inside. total execution time: 1.858765 seconds for 1000192 loops per loop execution time: 1.858 usec New numbers: vogelweide:/abuild/mike/:[0]# !./semop ./semop-multi 256 64 cpus 64, threads: 256, semaphores: 64, test duration: 30 secs total operations: 45521478, ops/sec 1517382 vogelweide:/abuild/mike/:[0]# !./osim ./osim 64 256 100 0 0 osim sems tasks loops busy-in busy-out osim: using a semaphore array with 64 semaphores. osim: using 256 tasks. osim: each thread loops 3907 times osim: each thread busyloops 0 loops outside and 0 loops inside. total execution time: 0.350682 seconds for 1000192 loops per loop execution time: 0.350 usec (1.8-0.3?.. box, you ain't a race horse, you're a plow horse) vogelweide:/abuild/mike/:[0]# ./osim 64 256 100 0 0 osim sems tasks loops busy-in busy-out osim: using a semaphore array with 64 semaphores. osim: using 256 tasks. osim: each thread loops 3907 times osim: each thread busyloops 0 loops outside and 0 loops inside. total execution time: 0.276405 seconds for 1000192 loops per loop execution time: 0.276 usec vogelweide:/abuild/mike/:[0]# ./osim 64 256 100 0 0 osim sems tasks loops busy-in busy-out osim: using a semaphore array with 64 semaphores. osim: using 256 tasks. osim: each thread loops 3907 times osim: each thread busyloops 0 loops outside and 0 loops inside. total execution time: 0.370041 seconds for 1000192 loops per loop execution time: 0.369 usec vogelweide:/abuild/mike/:[0]# ./osim 64 256 100 0 0 osim sems tasks loops busy-in busy-out osim: using a semaphore array with 64 semaphores. osim: using 256 tasks. osim: each thread loops 3907 times osim: each thread busyloops 0 loops outside and 0 loops inside. total execution time: 0.502396 seconds for 1000192 loops per loop execution time: 0.502 usec (runtime) vogelweide:/abuild/mike/:[0]# ./osim 64 256 1000 0 0 osim sems tasks loops busy-in busy-out osim: using a semaphore array with 64 semaphores. osim: using 256 tasks. osim: each thread loops 39063 times osim: each thread busyloops 0 loops outside and 0 loops inside. total execution time: 3.354423 seconds for 1128 loops per loop execution time: 0.335 usec vogelweide:/abuild/mike/:[0]# ./osim 64 256 1 0 0 osim sems tasks loops busy-in busy-out osim: using a semaphore array with 64 semaphores. osim: using 256 tasks. osim: each thread loops 390625 times osim: each thread busyloops 0 loops outside and 0 loops inside. total execution time: 41.180479 seconds for 1 loops per loop execution time: 0.411 usec Box likes your idea. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/