> I have 2 major comments here. First of all, you should break up your patch > into smaller ones. Large patch like the one in the tar ball is hard to > review.
Ok, we will do it. >Secondly, you are modifying over 1000 lines of code in mm/slab.c > with some modest increase in performance. That can be hard to justify. Maybe > you should find other use cases that involve less changes, but still have > noticeable performance improvement. That will make it easier to be accepted. In order to be justified the attachment in this letter include 3 files: 1. user space code (thread.c), which can cause lots of hot kernel spinlock from __kmalloc and kfree on multi-core platform 2. ali_work_queue.patch , the kernel patch for 4.3.0-rc4, when we run user space code (thread.c) based on the patch, the synchronous operation consumption from __kmalloc and kfree is about 15% on Intel E5-2699V3 3. org_spin_lock.patch, which is based on above ali_work_queue.patch, when we run user space code thread.c based on the patch, the synchronous operation consumption from __kmalloc and kfree is about 25% on Intel E5-2699V3 the main difference between ali_work_queue.patch and org_spin_lock.patch as below: diff --git a/mm/slab.h b/mm/slab.h ... - ali_spinlock_t list_lock; + spinlock_t list_lock; ... diff --git a/mm/slab.c b/mm/slab.c ... - alispinlock(lock, &info); + spin_lock((spinlock_t *)lock); + fn(para); + spin_unlock((spinlock_t *)lock); ... The above operations remove all performance noise from program modification. We run user space code thread.c with ali_work_queue.patch, and org_spin_lock.patch respectively the output from thread.c as below: ORG NEW 38923684 43380604 38100464 44163011 37769241 43354266 37908638 43554022 37900994 43457066 38495073 43421394 37340217 43146352 38083979 43506951 37713263 43775215 37749871 43487289 37843224 43366055 38173823 43270225 38303612 43214675 37886717 44083950 37736455 43060728 37529307 44607597 38862690 43541484 37992824 44749925 38013454 43572225 37783135 45240502 37745372 44712540 38721413 43584658 38097842 43235392 ORG NEW TOTAL 874675292 1005486126 So the data tell us the new mechanism can improve performance 14% ( 1005486126/874675292) , and the operation can be justified fairly. Thanks Ling 2016-02-04 5:42 GMT+08:00 Waiman Long <waiman.l...@hpe.com>: > On 02/02/2016 11:40 PM, Ling Ma wrote: >> >> Longman, >> >> The attachment include user space code(thread.c), and kernel >> patch(ali_work_queue.patch) based on 4.3.0-rc4, >> we replaced all original spinlock (list_lock) in slab.h/c with the >> new mechanism. >> >> The thread.c in user space caused lots of hot kernel spinlock from >> __kmalloc and kfree, >> perf top -d1 shows ~25% before ali_work_queue.patch,after appending >> this patch , >> the synchronous operation consumption from __kmalloc and kfree is >> reduced from 25% to ~15% on Intel E5-2699V3 >> (we also observed the output from user space code (thread.c) is >> improved clearly) > > > I have 2 major comments here. First of all, you should break up your patch > into smaller ones. Large patch like the one in the tar ball is hard to > review. Secondly, you are modifying over 1000 lines of code in mm/slab.c > with some modest increase in performance. That can be hard to justify. Maybe > you should find other use cases that involve less changes, but still have > noticeable performance improvement. That will make it easier to be accepted. > > Cheers, > Longman > >
ali_work_queue.tar.bz2
Description: BZip2 compressed data