Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform
Is it acceptable for performance improvement or more comments on this patch? Thanks Ling 2016-04-05 11:44 GMT+08:00 Ling Ma : > Hi Longman, > >> with some modest increase in performance. That can be hard to justify. Maybe >> you should find other use cases that involve less changes, but still have >> noticeable performance improvement. That will make it easier to be accepted. > > The attachment is for other use case with the new lock optimization. > It include two files: main.c (user space workload), > fcntl-lock-opt.patch (kernel patch on 4.3.0-rc4 version) > (The hardware platform is on Intel E5 2699 V3, 72 threads (18core *2Socket > *2HT) > > 1. when we run a.out from main.c on original 4.3.0-rc4 version, > the average throughput from a.out is 1887592( 98% cpu cost from perf top -d1) > > 2. when we run a.out from main.c with the fcntl-lock-opt.patch , > the average throughput from a.out is 5277281 (91% cpu cost from perf top -d1) > > So we say the new mechanism give us about 2.79x (5277281 / 1887592) > improvement. > > Appreciate your comments. > > Thanks > Ling
Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform
Hi Longman, > with some modest increase in performance. That can be hard to justify. Maybe > you should find other use cases that involve less changes, but still have > noticeable performance improvement. That will make it easier to be accepted. The attachment is for other use case with the new lock optimization. It include two files: main.c (user space workload), fcntl-lock-opt.patch (kernel patch on 4.3.0-rc4 version) (The hardware platform is on Intel E5 2699 V3, 72 threads (18core *2Socket *2HT) 1. when we run a.out from main.c on original 4.3.0-rc4 version, the average throughput from a.out is 1887592( 98% cpu cost from perf top -d1) 2. when we run a.out from main.c with the fcntl-lock-opt.patch , the average throughput from a.out is 5277281 (91% cpu cost from perf top -d1) So we say the new mechanism give us about 2.79x (5277281 / 1887592) improvement. Appreciate your comments. Thanks Ling test-lock.tar Description: Unix tar archive
Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform
> I have 2 major comments here. First of all, you should break up your patch > into smaller ones. Large patch like the one in the tar ball is hard to > review. Ok, we will do it. >Secondly, you are modifying over 1000 lines of code in mm/slab.c > with some modest increase in performance. That can be hard to justify. Maybe > you should find other use cases that involve less changes, but still have > noticeable performance improvement. That will make it easier to be accepted. In order to be justified the attachment in this letter include 3 files: 1. user space code (thread.c), which can cause lots of hot kernel spinlock from __kmalloc and kfree on multi-core platform 2. ali_work_queue.patch , the kernel patch for 4.3.0-rc4, when we run user space code (thread.c) based on the patch, the synchronous operation consumption from __kmalloc and kfree is about 15% on Intel E5-2699V3 3. org_spin_lock.patch, which is based on above ali_work_queue.patch, when we run user space code thread.c based on the patch, the synchronous operation consumption from __kmalloc and kfree is about 25% on Intel E5-2699V3 the main difference between ali_work_queue.patch and org_spin_lock.patch as below: diff --git a/mm/slab.h b/mm/slab.h ... - ali_spinlock_t list_lock; + spinlock_t list_lock; ... diff --git a/mm/slab.c b/mm/slab.c ... - alispinlock(lock, &info); + spin_lock((spinlock_t *)lock); + fn(para); + spin_unlock((spinlock_t *)lock); ... The above operations remove all performance noise from program modification. We run user space code thread.c with ali_work_queue.patch, and org_spin_lock.patch respectively the output from thread.c as below: ORG NEW 38923684 43380604 38100464 44163011 37769241 43354266 37908638 43554022 37900994 43457066 38495073 43421394 37340217 43146352 38083979 43506951 37713263 43775215 37749871 43487289 37843224 43366055 38173823 43270225 38303612 43214675 37886717 44083950 37736455 43060728 37529307 44607597 38862690 43541484 37992824 44749925 38013454 43572225 37783135 45240502 37745372 44712540 38721413 43584658 38097842 43235392 ORGNEW TOTAL 874675292 1005486126 So the data tell us the new mechanism can improve performance 14% ( 1005486126/874675292) , and the operation can be justified fairly. Thanks Ling 2016-02-04 5:42 GMT+08:00 Waiman Long : > On 02/02/2016 11:40 PM, Ling Ma wrote: >> >> Longman, >> >> The attachment include user space code(thread.c), and kernel >> patch(ali_work_queue.patch) based on 4.3.0-rc4, >> we replaced all original spinlock (list_lock) in slab.h/c with the >> new mechanism. >> >> The thread.c in user space caused lots of hot kernel spinlock from >> __kmalloc and kfree, >> perf top -d1 shows ~25% before ali_work_queue.patch,after appending >> this patch , >> the synchronous operation consumption from __kmalloc and kfree is >> reduced from 25% to ~15% on Intel E5-2699V3 >> (we also observed the output from user space code (thread.c) is >> improved clearly) > > > I have 2 major comments here. First of all, you should break up your patch > into smaller ones. Large patch like the one in the tar ball is hard to > review. Secondly, you are modifying over 1000 lines of code in mm/slab.c > with some modest increase in performance. That can be hard to justify. Maybe > you should find other use cases that involve less changes, but still have > noticeable performance improvement. That will make it easier to be accepted. > > Cheers, > Longman > > ali_work_queue.tar.bz2 Description: BZip2 compressed data
Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform
On 02/02/2016 11:40 PM, Ling Ma wrote: Longman, The attachment include user space code(thread.c), and kernel patch(ali_work_queue.patch) based on 4.3.0-rc4, we replaced all original spinlock (list_lock) in slab.h/c with the new mechanism. The thread.c in user space caused lots of hot kernel spinlock from __kmalloc and kfree, perf top -d1 shows ~25% before ali_work_queue.patch,after appending this patch , the synchronous operation consumption from __kmalloc and kfree is reduced from 25% to ~15% on Intel E5-2699V3 (we also observed the output from user space code (thread.c) is improved clearly) I have 2 major comments here. First of all, you should break up your patch into smaller ones. Large patch like the one in the tar ball is hard to review. Secondly, you are modifying over 1000 lines of code in mm/slab.c with some modest increase in performance. That can be hard to justify. Maybe you should find other use cases that involve less changes, but still have noticeable performance improvement. That will make it easier to be accepted. Cheers, Longman
Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform
The attachment(thread.c) can tell us the new mechanism improve output from the user space code (thread,c) by 1.14x (1174810406/1026910602, kernel spinlock consumption is reduced from 25% to 15%) as below: ORG NEW 38186815 43644156 38340186 43121265 38383155 44087753 38567102 43532586 38027878 43622700 38011581 43396376 37861959 43322857 37963215 43375528 38039247 43618315 37989106 43406187 37916912 44163029 39053184 43138581 37928359 43247866 37967417 43390352 37909796 43218250 37727531 43256009 38032818 43460496 38001860 43536100 38019929 44231331 37846621 43550597 37823231 44229887 38108158 43142689 37771900 43228168 37652536 43901042 37649114 43172690 37591314 43380004 38539678 43435592 Total 1026910602 1174810406 Thanks Ling 2016-02-03 12:40 GMT+08:00 Ling Ma : > Longman, > > The attachment include user space code(thread.c), and kernel > patch(ali_work_queue.patch) based on 4.3.0-rc4, > we replaced all original spinlock (list_lock) in slab.h/c with the > new mechanism. > > The thread.c in user space caused lots of hot kernel spinlock from > __kmalloc and kfree, > perf top -d1 shows ~25% before ali_work_queue.patch,after appending > this patch , > the synchronous operation consumption from __kmalloc and kfree is > reduced from 25% to ~15% on Intel E5-2699V3 > (we also observed the output from user space code (thread.c) is > improved clearly) > > Peter, we will send the update version according to your comments. > > Thanks > Ling > > > 2016-01-19 23:36 GMT+08:00 Waiman Long : >> On 01/19/2016 03:52 AM, Ling Ma wrote: >>> >>> Is it acceptable for performance improvement or more comments on this >>> patch? >>> >>> Thanks >>> Ling >>> >>> >> >> Your alispinlock patchset should also include a use case where the lock is >> used by some code within the kernel with test that can show a performance >> improvement so that the reviewers can independently try it out and play >> around with it. The kernel community will not accept any patch without a use >> case in the kernel. >> >> Your lock_test.tar file is not good enough as it is not a performance test >> of the patch that you sent out. >> >> Cheers, >> Longman /** Test Case: OpenDir, Get status and close it. */ #include #include #include #include #include #include #include #include #define TEST_DIR "/tmp/thread" #define MAX_TEST_THREAD (80) #define MAX_TEST_FILE 5000 static unsigned long *result[MAX_TEST_THREAD]; static int stop = 0; static void* case_function(void *para) { int id = (int)(long)para; DIR *pDir; struct stat f_stat; struct dirent *entry=NULL; char path[256]; char cmd[512]; int filecnt = 0; int dircnt= 0; int filetotalsize = 0; unsigned long myresult = 0; int f = 0; result[id] = &myresult; /* Goto my path and construct empty file */ sprintf(path, "%s/%d", TEST_DIR, id); printf("Creating temp file at %s\n", path); sprintf(cmd, "mkdir %s", path); system(cmd); chdir(path); for (f = 0; f < MAX_TEST_FILE; f++) { char name[256]; sprintf(name, "%s/%d", path, f); int t = open(name, O_RDWR | O_CREAT | O_TRUNC, S_IRWXU); if (t != -1) close(t); else { printf("Errno = %d.\n", errno); exit(errno); } } again: if ((pDir = opendir(path)) == NULL) { printf("打开 %s 错误:没有那个文件或目录\n", TEST_DIR); goto err; } while ((entry = readdir(pDir)) != NULL) { struct stat buf; if (entry->d_name[0] == '.') continue; //f = open(entry->d_name, 0); f = stat(entry->d_name, &buf); if (f) close(f); myresult++; //printf("Filename %s, size %10d",entry->d_name, f_stat.st_size); } closedir(pDir); /* Need to stop */ if (!stop) goto again; return 0; err: ; } void main() { int i; pthread_t thread; system("mkdir "TEST_DIR); for (i = 0; i < MAX_TEST_THREAD; i++) { pthread_create(&thread, NULL, case_function, (void*)(long)i); } while (1) { sleep(1); unsigned long times = 0; //printf("Statistics:\n"); for (i = 0; i < MAX_TEST_THREAD; i++) { //printf("%d\t", *result[i]); times =times + *result[i]; } printf("%ld\t\n", times); for (i = 0; i < MAX_TEST_THREAD; i++) *result[i] = 0; } }
Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform
Longman, The attachment include user space code(thread.c), and kernel patch(ali_work_queue.patch) based on 4.3.0-rc4, we replaced all original spinlock (list_lock) in slab.h/c with the new mechanism. The thread.c in user space caused lots of hot kernel spinlock from __kmalloc and kfree, perf top -d1 shows ~25% before ali_work_queue.patch,after appending this patch , the synchronous operation consumption from __kmalloc and kfree is reduced from 25% to ~15% on Intel E5-2699V3 (we also observed the output from user space code (thread.c) is improved clearly) Peter, we will send the update version according to your comments. Thanks Ling 2016-01-19 23:36 GMT+08:00 Waiman Long : > On 01/19/2016 03:52 AM, Ling Ma wrote: >> >> Is it acceptable for performance improvement or more comments on this >> patch? >> >> Thanks >> Ling >> >> > > Your alispinlock patchset should also include a use case where the lock is > used by some code within the kernel with test that can show a performance > improvement so that the reviewers can independently try it out and play > around with it. The kernel community will not accept any patch without a use > case in the kernel. > > Your lock_test.tar file is not good enough as it is not a performance test > of the patch that you sent out. > > Cheers, > Longman ali_work_queue.tar.bz2 Description: BZip2 compressed data
Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform
On Wed, 6 Jan 2016 09:21:06 +0100 Peter Zijlstra wrote: > On Wed, Jan 06, 2016 at 09:16:43AM +0100, Peter Zijlstra wrote: > > On Tue, Jan 05, 2016 at 09:42:27PM +, One Thousand Gnomes wrote: > > > > It suffers the typical problems all those constructs do; namely it > > > > wrecks accountability. > > > > > > That's "government thinking" ;-) - for most real users throughput is > > > more important than accountability. With the right API it ought to also > > > be compile time switchable. > > > > Its to do with having been involved with -rt. RT wants to do > > accountability for such things because of PI and sorts. > > Also, real people really do care about latency too, very bad worst case > spikes to upset things. Some yes - I'm familiar with the way some of the big financial number crunching jobs need this. There are also people who instead care a lot about throughput. Anything like this needs to end up with an external API which looks the same whether the work is done via one thread or the other. Alan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform
On Wed, Jan 06, 2016 at 09:16:43AM +0100, Peter Zijlstra wrote: > On Tue, Jan 05, 2016 at 09:42:27PM +, One Thousand Gnomes wrote: > > > It suffers the typical problems all those constructs do; namely it > > > wrecks accountability. > > > > That's "government thinking" ;-) - for most real users throughput is > > more important than accountability. With the right API it ought to also > > be compile time switchable. > > Its to do with having been involved with -rt. RT wants to do > accountability for such things because of PI and sorts. Also, real people really do care about latency too, very bad worst case spikes to upset things. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform
On Tue, Jan 05, 2016 at 09:42:27PM +, One Thousand Gnomes wrote: > > It suffers the typical problems all those constructs do; namely it > > wrecks accountability. > > That's "government thinking" ;-) - for most real users throughput is > more important than accountability. With the right API it ought to also > be compile time switchable. Its to do with having been involved with -rt. RT wants to do accountability for such things because of PI and sorts. > > But here that is compounded by the fact that you inject other people's > > work into 'your' lock region, thereby bloating lock hold times. Worse, > > afaict (from a quick reading) there really isn't a bound on the amount > > of work you inject. > > That should be relatively easy to fix but for this kind of lock you > normally get the big wins from stuff that is only a short amount of > executing code. The fairness your trade in the cases it is useful should > be tiny except under extreme load, where the "accountability first" > behaviour would be to fall over in a heap. > > If your "lock" involves a lot of work then it probably should be a work > queue or not using this kind of locking. Sure, but the fact that it was not even mentioned/considered doesn't give me a warm fuzzy feeling. > > And while its a cute collapse of an MCS lock and lockless list style > > work queue (MCS after all is a lockless list), saving a few cycles from > > the naive spinlock+llist implementation of the same thing, I really > > do not see enough justification for any of this. > > I've only personally dealt with such locks in the embedded space but > there it was a lot more than a few cycles because you go from Nah, what I meant was that you can do the same callback style construct with a llist and a spinlock. > The claim in the original post is 3x performance but doesn't explain > performance doing what, or which kernel locks were switched and what > patches were used. I don't find the numbers hard to believe for a big big > box, but I'd like to see the actual use case patches so it can be benched > with other workloads and also for latency and the like. Very much agreed, those claims need to be substantiated with actual patches using this thing and independently verified. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform
> It suffers the typical problems all those constructs do; namely it > wrecks accountability. That's "government thinking" ;-) - for most real users throughput is more important than accountability. With the right API it ought to also be compile time switchable. > But here that is compounded by the fact that you inject other people's > work into 'your' lock region, thereby bloating lock hold times. Worse, > afaict (from a quick reading) there really isn't a bound on the amount > of work you inject. That should be relatively easy to fix but for this kind of lock you normally get the big wins from stuff that is only a short amount of executing code. The fairness your trade in the cases it is useful should be tiny except under extreme load, where the "accountability first" behaviour would be to fall over in a heap. If your "lock" involves a lot of work then it probably should be a work queue or not using this kind of locking. > And while its a cute collapse of an MCS lock and lockless list style > work queue (MCS after all is a lockless list), saving a few cycles from > the naive spinlock+llist implementation of the same thing, I really > do not see enough justification for any of this. I've only personally dealt with such locks in the embedded space but there it was a lot more than a few cycles because you go from take lock spins pull things into cache do stuff cache lines go write/exclusive unlock take lock move all the cache do stuff etc to take lock queue work pull things into cache do work 1 caches line go write/exclusive do work 2 unlock done and for the kind of stuff you apply those locks you got big improvements. Even on crappy little embedded processors cache bouncing hurts. Even better work merging locks like this tend to improve throughput more the higher the contention unlike most other lock types. The claim in the original post is 3x performance but doesn't explain performance doing what, or which kernel locks were switched and what patches were used. I don't find the numbers hard to believe for a big big box, but I'd like to see the actual use case patches so it can be benched with other workloads and also for latency and the like. Alan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform
On Thu, Dec 31, 2015 at 04:09:34PM +0800, ling.ma.prog...@gmail.com wrote: > +void alispinlock(struct ali_spinlock *lock, struct ali_spinlock_info *ali) > +{ > + struct ali_spinlock_info *next, *old; > + > + ali->next = NULL; > + ali->locked = 1; > + old = xchg(&lock->lock_p, ali); > + > + /* If NULL we are the first one */ > + if (old) { > + WRITE_ONCE(old->next, ali); > + if(ali->flags & ALI_LOCK_FREE) > + return; > + while((READ_ONCE(ali->locked))) > + cpu_relax_lowlatency(); > + return; > + } > + old = READ_ONCE(lock->lock_p); > + > + /* Handle all pending works */ > +repeat: > + if(old == ali) > + goto end; > + > + while (!(next = READ_ONCE(ali->next))) > + cpu_relax(); > + > + ali->fn(ali->para); > + ali->locked = 0; > + > + if(old != next) { > + while (!(ali = READ_ONCE(next->next))) > + cpu_relax(); > + next->fn(next->para); > + next->locked = 0; > + goto repeat; > + > + } else > + ali = next; So I have a whole bunch of problems with this thing.. For one I object to this being called a lock. Its much more like an async work queue like thing. It suffers the typical problems all those constructs do; namely it wrecks accountability. But here that is compounded by the fact that you inject other people's work into 'your' lock region, thereby bloating lock hold times. Worse, afaict (from a quick reading) there really isn't a bound on the amount of work you inject. This will completely wreck scheduling latency. At the very least the callback loop should have a need_resched() test on, but even that will not work if this has IRQs disabled. And while its a cute collapse of an MCS lock and lockless list style work queue (MCS after all is a lockless list), saving a few cycles from the naive spinlock+llist implementation of the same thing, I really do not see enough justification for any of this. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform
On 12/31/2015 03:09 AM, ling.ma.prog...@gmail.com wrote: From: Ma Ling Hi ALL, Wire-latency(RC delay) dominate modern computer performance, conventional serialized works cause cache line ping-pong seriously, the process spend lots of time and power to complete. specially on multi-core platform. However if the serialized works are sent to one core and executed when lock contention happens, that can save much time and power, because all shared data are located in private cache of one core. We call the mechanism as Acceleration from Lock Integration (ali spinlock) Usually when requests are queued, we have to wait work to submit one bye one, in order to improve the whole throughput further, we introduce LOCK_FREE. So when requests are sent to lock owner, requester may do other works in parallelism, then ali_spin_is_completed function could tell us whether the work has been completed. The new code is based on qspinlock and implement Lock Integration, improves performance up to 3X on intel platform with 72 cores(18x2HTx2S HSW), 2X on ARM platform with 96 cores too. And additional trival changes on Makefile/Kconfig are made to enable compiling of this feature on x86 platform. (We would like to do further experiments according to your requirement) Happy New Year 2016! Ling Signed-off-by: Ma Ling --- arch/x86/Kconfig |1 + include/linux/alispinlock.h | 41 ++ kernel/Kconfig.locks |7 +++ kernel/locking/Makefile |1 + kernel/locking/alispinlock.c | 97 ++ 5 files changed, 147 insertions(+), 0 deletions(-) create mode 100644 include/linux/alispinlock.h create mode 100644 kernel/locking/alispinlock.c You should include additional patches that illustrate the possible use cases and performance improvement before and after the patches. This will allow the reviewers to actually try it out and play with it. Cheers, Longman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform
From: Ma Ling Hi ALL, Wire-latency(RC delay) dominate modern computer performance, conventional serialized works cause cache line ping-pong seriously, the process spend lots of time and power to complete. specially on multi-core platform. However if the serialized works are sent to one core and executed when lock contention happens, that can save much time and power, because all shared data are located in private cache of one core. We call the mechanism as Acceleration from Lock Integration (ali spinlock) Usually when requests are queued, we have to wait work to submit one bye one, in order to improve the whole throughput further, we introduce LOCK_FREE. So when requests are sent to lock owner, requester may do other works in parallelism, then ali_spin_is_completed function could tell us whether the work has been completed. The new code is based on qspinlock and implement Lock Integration, improves performance up to 3X on intel platform with 72 cores(18x2HTx2S HSW), 2X on ARM platform with 96 cores too. And additional trival changes on Makefile/Kconfig are made to enable compiling of this feature on x86 platform. (We would like to do further experiments according to your requirement) Happy New Year 2016! Ling Signed-off-by: Ma Ling --- arch/x86/Kconfig |1 + include/linux/alispinlock.h | 41 ++ kernel/Kconfig.locks |7 +++ kernel/locking/Makefile |1 + kernel/locking/alispinlock.c | 97 ++ 5 files changed, 147 insertions(+), 0 deletions(-) create mode 100644 include/linux/alispinlock.h create mode 100644 kernel/locking/alispinlock.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index db3622f..47d9277 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -42,6 +42,7 @@ config X86 select ARCH_USE_CMPXCHG_LOCKREF if X86_64 select ARCH_USE_QUEUED_RWLOCKS select ARCH_USE_QUEUED_SPINLOCKS + select ARCH_USE_ALI_SPINLOCKS select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH if SMP select ARCH_WANTS_DYNAMIC_TASK_STRUCT select ARCH_WANT_FRAME_POINTERS diff --git a/include/linux/alispinlock.h b/include/linux/alispinlock.h new file mode 100644 index 000..5207c41 --- /dev/null +++ b/include/linux/alispinlock.h @@ -0,0 +1,41 @@ +#ifndef ALI_SPINLOCK_H +#define ALI_SPINLOCK_H +/* + * Acceleration from Lock Integration + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * Copyright (C) 2015 Alibaba Group. + * + * Authors: Ma Ling + * + */ +typedef struct ali_spinlock { + void *lock_p; +} ali_spinlock_t; + +struct ali_spinlock_info { + struct ali_spinlock_info *next; + int flags; + int locked; + void (*fn)(void *); + void *para; +}; + +static __always_inline int ali_spin_is_completed(struct ali_spinlock_info *ali) +{ + return (READ_ONCE(ali->locked) == 0); +} + +void alispinlock(struct ali_spinlock *lock, struct ali_spinlock_info *ali); + +#define ALI_LOCK_FREE 1 +#endif /* ALI_SPINLOCK_H */ diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks index ebdb004..5130c63 100644 --- a/kernel/Kconfig.locks +++ b/kernel/Kconfig.locks @@ -235,6 +235,13 @@ config LOCK_SPIN_ON_OWNER def_bool y depends on MUTEX_SPIN_ON_OWNER || RWSEM_SPIN_ON_OWNER +config ARCH_USE_ALI_SPINLOCKS + bool + +config ALI_SPINLOCKS + def_bool y if ARCH_USE_ALI_SPINLOCKS + depends on SMP + config ARCH_USE_QUEUED_SPINLOCKS bool diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile index 8e96f6c..a4241f8 100644 --- a/kernel/locking/Makefile +++ b/kernel/locking/Makefile @@ -13,6 +13,7 @@ obj-$(CONFIG_LOCKDEP) += lockdep.o ifeq ($(CONFIG_PROC_FS),y) obj-$(CONFIG_LOCKDEP) += lockdep_proc.o endif +obj-$(CONFIG_ALI_SPINLOCKS) += alispinlock.o obj-$(CONFIG_SMP) += spinlock.o obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o obj-$(CONFIG_SMP) += lglock.o diff --git a/kernel/locking/alispinlock.c b/kernel/locking/alispinlock.c new file mode 100644 index 000..43078b4 --- /dev/null +++ b/kernel/locking/alispinlock.c @@ -0,0 +1,97 @@ +/* + * Acceleration from Lock Integration + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY;