Re: [RFC PATCH v4 0/9] KVM: selftests: some improvement and a new test for kvm page table
Hi all, Kindly ping :)! Are there any further comments for this v4 series? Please let me know if there is still something that needs fixing. Or is this v4 series fine enough to be queued? Most of the patches have been added with Reviewed-by. If there are merge conflicts with the newest branch, please also let me know and I will send a new version fixed. Regards, Yanan On 2021/3/2 20:57, Yanan Wang wrote: Hi, This v4 series can mainly include two parts. Based on kvm queue branch: https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=queue Links of v1: https://lore.kernel.org/lkml/20210208090841.333724-1-wangyana...@huawei.com/ Links of v2: https://lore.kernel.org/lkml/20210225055940.18748-1-wangyana...@huawei.com/ Links of v3: https://lore.kernel.org/lkml/20210301065916.11484-1-wangyana...@huawei.com/ In the first part, all the known hugetlb backing src types specified with different hugepage sizes are listed, so that we can specify use of hugetlb source of the exact granularity that we want, instead of the system default ones. And as all the known hugetlb page sizes are listed, it's appropriate for all architectures. Besides, a helper that can get granularity of different backing src types(anonumous/thp/hugetlb) is added, so that we can use the accurate backing src granularity for kinds of alignment or guest memory accessing of vcpus. In the second part, a new test is added: This test is added to serve as a performance tester and a bug reproducer for kvm page table code (GPA->HPA mappings), it gives guidance for the people trying to make some improvement for kvm. And the following explains what we can exactly do through this test. The function guest_code() can cover the conditions where a single vcpu or multiple vcpus access guest pages within the same memory region, in three VM stages(before dirty logging, during dirty logging, after dirty logging). Besides, the backing src memory type(ANONYMOUS/THP/HUGETLB) of the tested memory region can be specified by users, which means normal page mappings or block mappings can be chosen by users to be created in the test. If ANONYMOUS memory is specified, kvm will create normal page mappings for the tested memory region before dirty logging, and update attributes of the page mappings from RO to RW during dirty logging. If THP/HUGETLB memory is specified, kvm will create block mappings for the tested memory region before dirty logging, and split the blcok mappings into normal page mappings during dirty logging, and coalesce the page mappings back into block mappings after dirty logging is stopped. So in summary, as a performance tester, this test can present the performance of kvm creating/updating normal page mappings, or the performance of kvm creating/splitting/recovering block mappings, through execution time. When we need to coalesce the page mappings back to block mappings after dirty logging is stopped, we have to firstly invalidate *all* the TLB entries for the page mappings right before installation of the block entry, because a TLB conflict abort error could occur if we can't invalidate the TLB entries fully. We have hit this TLB conflict twice on aarch64 software implementation and fixed it. As this test can imulate process from dirty logging enabled to dirty logging stopped of a VM with block mappings, so it can also reproduce this TLB conflict abort due to inadequate TLB invalidation when coalescing tables. Links about the TLB conflict abort: https://lore.kernel.org/lkml/20201201201034.116760-3-wangyana...@huawei.com/ --- Change logs: v3->v4: - Add a helper to get system default hugetlb page size - Add tags of Reviewed-by of Ben in the patches v2->v3: - Add tags of Suggested-by, Reviewed-by in the patches - Add a generic micro to get hugetlb page sizes - Some changes for suggestions about v2 series v1->v2: - Add a patch to sync header files - Add helpers to get granularity of different backing src types - Some changes for suggestions about v1 series --- Yanan Wang (9): tools headers: sync headers of asm-generic/hugetlb_encode.h tools headers: Add a macro to get HUGETLB page sizes for mmap KVM: selftests: Use flag CLOCK_MONOTONIC_RAW for timing KVM: selftests: Make a generic helper to get vm guest mode strings KVM: selftests: Add a helper to get system configured THP page size KVM: selftests: Add a helper to get system default hugetlb page size KVM: selftests: List all hugetlb src types specified with page sizes KVM: selftests: Adapt vm_userspace_mem_region_add to new helpers KVM: selftests: Add a test for kvm page table code include/uapi/linux/mman.h | 2 + tools/include/asm-generic/hugetlb_encode.h| 3 + tools/include/uapi/linux/mman.h | 2 + tools/testing/selftests/kvm/Makefile | 3 + .../selftests/kvm/demand_paging_test.c| 8 +- .../selftests/kvm/dirty_log_perf_test.c | 14 +- .../testing/selftests/kvm/include/kvm_uti
[RFC PATCH v4 0/9] KVM: selftests: some improvement and a new test for kvm page table
Hi, This v4 series can mainly include two parts. Based on kvm queue branch: https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=queue Links of v1: https://lore.kernel.org/lkml/20210208090841.333724-1-wangyana...@huawei.com/ Links of v2: https://lore.kernel.org/lkml/20210225055940.18748-1-wangyana...@huawei.com/ Links of v3: https://lore.kernel.org/lkml/20210301065916.11484-1-wangyana...@huawei.com/ In the first part, all the known hugetlb backing src types specified with different hugepage sizes are listed, so that we can specify use of hugetlb source of the exact granularity that we want, instead of the system default ones. And as all the known hugetlb page sizes are listed, it's appropriate for all architectures. Besides, a helper that can get granularity of different backing src types(anonumous/thp/hugetlb) is added, so that we can use the accurate backing src granularity for kinds of alignment or guest memory accessing of vcpus. In the second part, a new test is added: This test is added to serve as a performance tester and a bug reproducer for kvm page table code (GPA->HPA mappings), it gives guidance for the people trying to make some improvement for kvm. And the following explains what we can exactly do through this test. The function guest_code() can cover the conditions where a single vcpu or multiple vcpus access guest pages within the same memory region, in three VM stages(before dirty logging, during dirty logging, after dirty logging). Besides, the backing src memory type(ANONYMOUS/THP/HUGETLB) of the tested memory region can be specified by users, which means normal page mappings or block mappings can be chosen by users to be created in the test. If ANONYMOUS memory is specified, kvm will create normal page mappings for the tested memory region before dirty logging, and update attributes of the page mappings from RO to RW during dirty logging. If THP/HUGETLB memory is specified, kvm will create block mappings for the tested memory region before dirty logging, and split the blcok mappings into normal page mappings during dirty logging, and coalesce the page mappings back into block mappings after dirty logging is stopped. So in summary, as a performance tester, this test can present the performance of kvm creating/updating normal page mappings, or the performance of kvm creating/splitting/recovering block mappings, through execution time. When we need to coalesce the page mappings back to block mappings after dirty logging is stopped, we have to firstly invalidate *all* the TLB entries for the page mappings right before installation of the block entry, because a TLB conflict abort error could occur if we can't invalidate the TLB entries fully. We have hit this TLB conflict twice on aarch64 software implementation and fixed it. As this test can imulate process from dirty logging enabled to dirty logging stopped of a VM with block mappings, so it can also reproduce this TLB conflict abort due to inadequate TLB invalidation when coalescing tables. Links about the TLB conflict abort: https://lore.kernel.org/lkml/20201201201034.116760-3-wangyana...@huawei.com/ --- Change logs: v3->v4: - Add a helper to get system default hugetlb page size - Add tags of Reviewed-by of Ben in the patches v2->v3: - Add tags of Suggested-by, Reviewed-by in the patches - Add a generic micro to get hugetlb page sizes - Some changes for suggestions about v2 series v1->v2: - Add a patch to sync header files - Add helpers to get granularity of different backing src types - Some changes for suggestions about v1 series --- Yanan Wang (9): tools headers: sync headers of asm-generic/hugetlb_encode.h tools headers: Add a macro to get HUGETLB page sizes for mmap KVM: selftests: Use flag CLOCK_MONOTONIC_RAW for timing KVM: selftests: Make a generic helper to get vm guest mode strings KVM: selftests: Add a helper to get system configured THP page size KVM: selftests: Add a helper to get system default hugetlb page size KVM: selftests: List all hugetlb src types specified with page sizes KVM: selftests: Adapt vm_userspace_mem_region_add to new helpers KVM: selftests: Add a test for kvm page table code include/uapi/linux/mman.h | 2 + tools/include/asm-generic/hugetlb_encode.h| 3 + tools/include/uapi/linux/mman.h | 2 + tools/testing/selftests/kvm/Makefile | 3 + .../selftests/kvm/demand_paging_test.c| 8 +- .../selftests/kvm/dirty_log_perf_test.c | 14 +- .../testing/selftests/kvm/include/kvm_util.h | 4 +- .../testing/selftests/kvm/include/test_util.h | 21 +- .../selftests/kvm/kvm_page_table_test.c | 476 ++ tools/testing/selftests/kvm/lib/kvm_util.c| 59 ++- tools/testing/selftests/kvm/lib/test_util.c | 122 - tools/testing/selftests/kvm/steal_time.c | 4 +- 12 files changed, 659 insertions(+), 59 deletions(-) create mode 100644 tools/testing/selftests/kvm/kvm_page_table_te
Re: [RFC PATCH v4 0/9] printk: new ringbuffer implementation
On (09/06/19 16:01), Peter Zijlstra wrote: > In fact, i've gotten output that is plain impossible with > the current junk. Peter, can you post any of those backtraces? Very curious. -ss
Re: [RFC PATCH v4 0/9] printk: new ringbuffer implementation
On Fri, Sep 06, 2019 at 04:01:26PM +0200, Peter Zijlstra wrote: > On Fri, Sep 06, 2019 at 02:42:11PM +0200, Petr Mladek wrote: > > 7. People would complain when continuous lines become less > >reliable. It might be most visible when mixing backtraces > >from all CPUs. Simple sorting by prefix will not make > >it readable. The historic way was to synchronize CPUs > >by a spin lock. But then the cpu_lock() could cause > >deadlock. > > Why? I'm running with that thing on, I've never seen a deadlock ever > because of it. In fact, i've gotten output that is plain impossible with > the current junk. > > The cpu-lock is inside the all-backtrace spinlock, not outside. And as I > said yesterday, only the lockless console has any wait-loops while > holding the cpu-lock. It _will_ make progress. So I've been a huge flaming idiot.. so while I'm not particularly sympathetic to NMIs that block, there are a number of really trivial deadlocks possible -- and it is a minor miracle I've not actually hit them (I suppose because printk() isn't really all that common). The whole cpu-lock thing I had needs to go. But not having it makes lockless console output unreadable and unsable garbage. I've got some ideas on a replacement, but I need to further consider it. :-/
Re: [RFC PATCH v4 0/9] printk: new ringbuffer implementation
On 2019-09-06, Peter Zijlstra wrote: >> I wish it was that simple. It is possible that I see it too >> complicated. But this comes to my mind: >> >> 1. The simple printk_buffer_store(buf, n) is not NMI safe. For this, >>we might need the reserve-store approach. > > Of course it is, and sure it has a reserve+commit internally. I'm sure > I posted an implenentation of something like this at some point. > > It is lockless (wait-free in fact, which is stronger) and supports > multi-readers. I'm sure I posted something like that before, and ISTR > John has something like that around somewhere too. Yes. It was called RFCv1[0]. > The only thing I'm omitting is doing vscnprintf() twice, first to > determine the length, and then into the reservation. Partly because I > think that is silly and 256 chars should be plenty for everyone, > partly because that avoids having vscnprintf() inside the cpu_lock() > and partly because it is simpler to not do that. Yes, this approach is more straight forward and was suggested in the feedback to RFCv1. Although I think the current limit (1024) should still be OK. Then we have 1 dedicated page per CPU for vscnprintf(). >> 2. The simple approach works only with lockless consoles. We need >>something else for the rest at least for NMI. Simle offloading >>to a kthread has been blocked for years. People wanted the >>trylock-and-flush-immediately approach. > > Have an irq_work to wake up a kthread that will print to shit > consoles. This is the approach in all the RFC versions. >> 5. John planed to use the cpu_lock in the lockless consoles. >>I wonder if it was only in the console->write() callback >>or if it would spread the lock more widely. The 8250 driver in RFCv1 uses the cpu-lock in console->write() on a per-character basis and in console->write_atomic() on a per-line basis. This is necessary because the 8250 driver cannot run lockless. It requires synchronization for its UART_IER clearing/setting before/after transmit. IMO the existing early console implementations are _not_ safe for preemption. This was the reason for the new write_atomic() callback in RFCv1. > Right, I'm saying that since you need it anyway, lift it up one layer. > It makes everything simpler. More simpler is more better. This was my reasoning for using the cpu-lock in RFCv1. Moving to a lockless ringbuffer for RFCv2 was because there was too much resistance/concern surrounding the cpu-lock. But yes, if we want to support atomic consoles, the cpu-lock will still be needed. The cpu-lock (and the related concerns) were discussed here[1]. >> 7. People would complain when continuous lines become less >>reliable. It might be most visible when mixing backtraces >>from all CPUs. Simple sorting by prefix will not make >>it readable. The historic way was to synchronize CPUs >>by a spin lock. But then the cpu_lock() could cause >>deadlock. > > Why? I'm running with that thing on, I've never seen a deadlock ever > because of it. As was discussed in the thread I just mentioned, introducing the cpu-lock means that _all_ NMI functions taking spinlocks need to use the cpu-lock. Even though Peter has never seen a deadlock, a deadlock is possible if a BUG is triggered while one such spinlock is held. Also note that it is not allowed to have 2 cpu-locks in the system. This is where the BKL references started showing up. Spinlocks in NMI context are rare, but they have existed in the past and could exist again in the future. My suggestion was to create the policy that any needed locking in NMI context must be done using the one cpu-lock. John Ogness [0] https://lkml.kernel.org/r/20190212143003.48446-1-john.ogn...@linutronix.de [1] https://lkml.kernel.org/r/20190227094655.ecdwhsc2bf5sp...@pathway.suse.cz
Re: [RFC PATCH v4 0/9] printk: new ringbuffer implementation
On (09/06/19 16:01), Peter Zijlstra wrote: > > 2. The simple approach works only with lockless consoles. We need > >something else for the rest at least for NMI. Simle offloading > >to a kthread has been blocked for years. People wanted the > >trylock-and-flush-immediately approach. > > Have an irq_work to wake up a kthread that will print to shit consoles. Do we need sched dependency? We can print a batch of pending logbuf messages and queue another irw_work if there are more pending messages, right? -ss
Re: [RFC PATCH v4 0/9] printk: new ringbuffer implementation
On Fri, Sep 06, 2019 at 04:01:26PM +0200, Peter Zijlstra wrote: > On Fri, Sep 06, 2019 at 02:42:11PM +0200, Petr Mladek wrote: > > 7. People would complain when continuous lines become less > >reliable. It might be most visible when mixing backtraces > >from all CPUs. Simple sorting by prefix will not make > >it readable. The historic way was to synchronize CPUs > >by a spin lock. But then the cpu_lock() could cause > >deadlock. > > Why? I'm running with that thing on, I've never seen a deadlock ever > because of it. In fact, i've gotten output that is plain impossible with > the current junk. > > The cpu-lock is inside the all-backtrace spinlock, not outside. And as I > said yesterday, only the lockless console has any wait-loops while > holding the cpu-lock. It _will_ make progress. Oooh, I think I see. So one solution would be to pass the NMI along in chain like. Send it to a single CPU at a time, when finished, send it to the next.
Re: [RFC PATCH v4 0/9] printk: new ringbuffer implementation
On Fri, Sep 06, 2019 at 02:42:11PM +0200, Petr Mladek wrote: > I wish it was that simple. It is possible that I see it too > complicated. But this comes to my mind: > > 1. The simple printk_buffer_store(buf, n) is not NMI safe. For this, >we might need the reserve-store approach. Of course it is, and sure it has a reserve+commit internally. I'm sure I posted an implenentation of something like this at some point. It is lockless (wait-free in fact, which is stronger) and supports multi-readers. I'm sure I posted something like that before, and ISTR John has something like that around somewhere too. The only thing I'm omitting is doing vscnprintf() twice, first to determine the length, and then into the reservation. Partly because I think that is silly and 256 chars should be plenty for everyone, partly because that avoids having vscnprintf() inside the cpu_lock() and partly because it is simpler to not do that. > 2. The simple approach works only with lockless consoles. We need >something else for the rest at least for NMI. Simle offloading >to a kthread has been blocked for years. People wanted the >trylock-and-flush-immediately approach. Have an irq_work to wake up a kthread that will print to shit consoles. Seriously.. the trylock and flush stuff is horrific crap. You guys been piling on the hack for years now, surely you're tired of that gunk? (and if you _realy_ care, build a flush function that 'works' mostly and waits for the kthread of choice to finish printing to the 'imporant' shit console). > 3. console_lock works in tty as a big kernel lock. I do not know >much details. But people familiar with the code said that >it was a disaster. I assume that tty is still rather >important console. I am not sure how it would fit into the >simple approach. The kernel thread in charge of printing doesn't care. > 4. The console handling has got non-synchronous (console_trylock) >quite early (ver 2.4.10, year 2001). The reason was to do not >serialize CPUs by the speed of the console. > >Serialized output could remove many troubles. The logic in >console_unlock() is really crazy. It might be acceptable >for debugging. But is it acceptable on production systems? The kernel thread doesn't care. If you care about independent consoles, have a kernel thread per console. That way a fast console can print fast while a slow console will print slow and everybody is happy. > 5. John planed to use the cpu_lock in the lockless consoles. >I wonder if it was only in the console->write() callback >or if it would spread the lock more widely. Right, I'm saying that since you need it anyway, lift it up one layer. It makes everything simpler. More simpler is more better. > 6. One huge nightmare is panic() and code called from there. >It is a maze of hacks, including arch-specific code, to >prevent deadlocks and get the messages out. > >Any lock might be blocked on any CPU at the moment. Or it >it might become blocked when CPUs are stopped by NMI. > >Fully lock-less log buffer might save us some headache. >I am not sure whether a single lock shared between printk() >writers and console drivers will make the situation easier >or more complicated. So panic is a non issue for the lockless console. It only matters if you care to get something out of the crap consoles. So print everything to the lockless buffer and lockless consoles, then try and force as much as you can out of the crap consoles. If you die, tought luck, at least the lockless consoles and kdump image have the whole message. > 7. People would complain when continuous lines become less >reliable. It might be most visible when mixing backtraces >from all CPUs. Simple sorting by prefix will not make >it readable. The historic way was to synchronize CPUs >by a spin lock. But then the cpu_lock() could cause >deadlock. Why? I'm running with that thing on, I've never seen a deadlock ever because of it. In fact, i've gotten output that is plain impossible with the current junk. The cpu-lock is inside the all-backtrace spinlock, not outside. And as I said yesterday, only the lockless console has any wait-loops while holding the cpu-lock. It _will_ make progress. > I would be really happy when we could ignore some of the problems > or find an easy solution. I just want to make sure that we take > into account all the known aspects. > > I am sure that we could do better than we do now. I do not want > to block any improvements. I am just a bit lost in the many > black corners. I hope the above helps. Also note that Linus' memory buffer is a lockless console.
Re: [RFC PATCH v4 0/9] printk: new ringbuffer implementation
On (09/06/19 12:49), Peter Zijlstra wrote: > On Fri, Sep 06, 2019 at 07:09:43PM +0900, Sergey Senozhatsky wrote: > > > --- > > diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c > > index 139c310049b1..9c73eb6259ce 100644 > > --- a/kernel/printk/printk_safe.c > > +++ b/kernel/printk/printk_safe.c > > @@ -103,7 +103,10 @@ static __printf(2, 0) int printk_safe_log_store(struct > > printk_safe_seq_buf *s, > > if (atomic_cmpxchg(&s->len, len, len + add) != len) > > goto again; > > > > - queue_flush_work(s); > > + if (early_console) > > + early_console->write(early_console, s->buffer + len, add); > > + else > > + queue_flush_work(s); > > return add; > > } > > You've not been following along, that generates absolutely unreadable > garbage. This was more of a joke/reference to "Those NMI buffers are a trainwreck and need to die a horrible death". Of course this needs a re-entrant cpu lock to serialize access to atomic/early consoles. But here is one more missing thing - we need atomic/early consoles on a separate, sort of immutable, list. And probably forbid any modifications of such console drivers, (PM, etc.) If we can do this then we don't need to take console_sem while we iterate that list, which removes sched/timekeeping locks out of the fast printk() path. We, at the same time, don't have that many options on systems without atomic/early consoles. Move printing to NMI (e.g. up to X pending logbug lines per NMI)? Move printing to IPI (again, up to X pending logbuf lines per IPI)? printk() softirqs? -ss
Re: [RFC PATCH v4 0/9] printk: new ringbuffer implementation
On Fri 2019-09-06 11:06:27, Peter Zijlstra wrote: > On Thu, Sep 05, 2019 at 04:31:18PM +0200, Peter Zijlstra wrote: > > So I have something roughly like the below; I'm suggesting you add the > > line with + on: > > > > int early_vprintk(const char *fmt, va_list args) > > { > > char buf[256]; // teh suck! > > int old, n = vscnprintf(buf, sizeof(buf), fmt, args); > > > > old = cpu_lock(); > > + printk_buffer_store(buf, n); > > early_console->write(early_console, buf, n); > > cpu_unlock(old); > > > > return n; > > } > > > > (yes, yes, we can get rid of the on-stack @buf thing with a > > reserve+commit API, but who cares :-)) > > Another approach is something like: > > DEFINE_PER_CPU(int, printk_nest); > DEFINE_PER_CPU(char, printk_line[4][256]); > > int vprintk(const char *fmt, va_list args) > { > int c, n, i; > char *buf; > > preempt_disable(); > i = min(3, this_cpu_inc_return(printk_nest) - 1); > buf = this_cpu_ptr(printk_line[i]); > n = vscnprintf(buf, 256, fmt, args); > > c = cpu_lock(); > printk_buffer_store(buf, n); > if (early_console) > early_console->write(early_console, buf, n); > cpu_unlock(c); > > this_cpu_dec(printk_nest); > preempt_enable(); > > return n; > } > > Again, simple and straight forward (and I'm sure it's been mentioned > before too). > > We really should not be making this stuff harder than it needs to be > (and anybody whining about lines longer than 256 characters can just go > away, those are unreadable anyway). I wish it was that simple. It is possible that I see it too complicated. But this comes to my mind: 1. The simple printk_buffer_store(buf, n) is not NMI safe. For this, we might need the reserve-store approach. 2. The simple approach works only with lockless consoles. We need something else for the rest at least for NMI. Simle offloading to a kthread has been blocked for years. People wanted the trylock-and-flush-immediately approach. 3. console_lock works in tty as a big kernel lock. I do not know much details. But people familiar with the code said that it was a disaster. I assume that tty is still rather important console. I am not sure how it would fit into the simple approach. 4. The console handling has got non-synchronous (console_trylock) quite early (ver 2.4.10, year 2001). The reason was to do not serialize CPUs by the speed of the console. Serialized output could remove many troubles. The logic in console_unlock() is really crazy. It might be acceptable for debugging. But is it acceptable on production systems? 5. John planed to use the cpu_lock in the lockless consoles. I wonder if it was only in the console->write() callback or if it would spread the lock more widely. 6. One huge nightmare is panic() and code called from there. It is a maze of hacks, including arch-specific code, to prevent deadlocks and get the messages out. Any lock might be blocked on any CPU at the moment. Or it it might become blocked when CPUs are stopped by NMI. Fully lock-less log buffer might save us some headache. I am not sure whether a single lock shared between printk() writers and console drivers will make the situation easier or more complicated. 7. People would complain when continuous lines become less reliable. It might be most visible when mixing backtraces from all CPUs. Simple sorting by prefix will not make it readable. The historic way was to synchronize CPUs by a spin lock. But then the cpu_lock() could cause deadlock. I would be really happy when we could ignore some of the problems or find an easy solution. I just want to make sure that we take into account all the known aspects. I am sure that we could do better than we do now. I do not want to block any improvements. I am just a bit lost in the many black corners. Best Regards, Petr
Re: [RFC PATCH v4 0/9] printk: new ringbuffer implementation
On Fri, Sep 06, 2019 at 07:09:43PM +0900, Sergey Senozhatsky wrote: > --- > diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c > index 139c310049b1..9c73eb6259ce 100644 > --- a/kernel/printk/printk_safe.c > +++ b/kernel/printk/printk_safe.c > @@ -103,7 +103,10 @@ static __printf(2, 0) int printk_safe_log_store(struct > printk_safe_seq_buf *s, > if (atomic_cmpxchg(&s->len, len, len + add) != len) > goto again; > > - queue_flush_work(s); > + if (early_console) > + early_console->write(early_console, s->buffer + len, add); > + else > + queue_flush_work(s); > return add; > } You've not been following along, that generates absolutely unreadable garbage.
Re: [RFC PATCH v4 0/9] printk: new ringbuffer implementation
On (09/06/19 11:06), Peter Zijlstra wrote: > Another approach is something like: > > DEFINE_PER_CPU(int, printk_nest); > DEFINE_PER_CPU(char, printk_line[4][256]); > > int vprintk(const char *fmt, va_list args) > { > int c, n, i; > char *buf; > > preempt_disable(); > i = min(3, this_cpu_inc_return(printk_nest) - 1); > buf = this_cpu_ptr(printk_line[i]); > n = vscnprintf(buf, 256, fmt, args); > > c = cpu_lock(); > printk_buffer_store(buf, n); > if (early_console) > early_console->write(early_console, buf, n); > cpu_unlock(c); > > this_cpu_dec(printk_nest); > preempt_enable(); > > return n; > } > > Again, simple and straight forward (and I'm sure it's been mentioned > before too). :) --- diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 139c310049b1..9c73eb6259ce 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -103,7 +103,10 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s, if (atomic_cmpxchg(&s->len, len, len + add) != len) goto again; - queue_flush_work(s); + if (early_console) + early_console->write(early_console, s->buffer + len, add); + else + queue_flush_work(s); return add; } --- -ss
Re: [RFC PATCH v4 0/9] printk: new ringbuffer implementation
On Thu 2019-09-05 12:11:01, Steven Rostedt wrote: > > [ Added Ted and Linux Plumbers ] > > On Thu, 5 Sep 2019 17:38:21 +0200 (CEST) > Thomas Gleixner wrote: > > > On Thu, 5 Sep 2019, Peter Zijlstra wrote: > > > On Thu, Sep 05, 2019 at 03:05:13PM +0200, Petr Mladek wrote: > > > > The alternative lockless approach is still more complicated than > > > > the serialized one. But I think that it is manageable thanks to > > > > the simplified state tracking. And I might safe use some pain > > > > in the long term. > > > > > > I've not looked at it yet, sorry. But per the above argument of needing > > > the CPU serialization _anyway_, I don't see a compelling reason not to > > > use it. > > > > > > It is simple, it works. Let's use it. > > > > > > If you really fancy a multi-writer buffer, you can always switch to one > > > later, if you can convince someone it actually brings benefits and not > > > just head-aches. > > > > Can we please grab one of the TBD slots at kernel summit next week, sit > > down in a room and hash that out? > > > > We should definitely be able to find a room that will be available next > week. Sounds great. I am blocked only during Livepatching miniconference that is scheduled on Wednesday, Sep 11 at 15:00 (basically the very last slot). Best Regards, Petr
Re: [RFC PATCH v4 0/9] printk: new ringbuffer implementation
On Thu, Sep 05, 2019 at 04:31:18PM +0200, Peter Zijlstra wrote: > So I have something roughly like the below; I'm suggesting you add the > line with + on: > > int early_vprintk(const char *fmt, va_list args) > { > char buf[256]; // teh suck! > int old, n = vscnprintf(buf, sizeof(buf), fmt, args); > > old = cpu_lock(); > + printk_buffer_store(buf, n); > early_console->write(early_console, buf, n); > cpu_unlock(old); > > return n; > } > > (yes, yes, we can get rid of the on-stack @buf thing with a > reserve+commit API, but who cares :-)) Another approach is something like: DEFINE_PER_CPU(int, printk_nest); DEFINE_PER_CPU(char, printk_line[4][256]); int vprintk(const char *fmt, va_list args) { int c, n, i; char *buf; preempt_disable(); i = min(3, this_cpu_inc_return(printk_nest) - 1); buf = this_cpu_ptr(printk_line[i]); n = vscnprintf(buf, 256, fmt, args); c = cpu_lock(); printk_buffer_store(buf, n); if (early_console) early_console->write(early_console, buf, n); cpu_unlock(c); this_cpu_dec(printk_nest); preempt_enable(); return n; } Again, simple and straight forward (and I'm sure it's been mentioned before too). We really should not be making this stuff harder than it needs to be (and anybody whining about lines longer than 256 characters can just go away, those are unreadable anyway).
Re: [RFC PATCH v4 0/9] printk: new ringbuffer implementation
On 2019-09-05, Steven Rostedt wrote: >>> But per the above argument of needing the CPU serialization >>> _anyway_, I don't see a compelling reason not to use it. >>> >>> It is simple, it works. Let's use it. >>> >>> If you really fancy a multi-writer buffer, you can always switch to >>> one later, if you can convince someone it actually brings benefits >>> and not just head-aches. >> >> Can we please grab one of the TBD slots at kernel summit next week, >> sit down in a room and hash that out? >> > > We should definitely be able to find a room that will be available > next week. FWIW, on Monday at 12:45 I am giving a talk[0] on the printk rework. I'll be dedicating a few slides to presenting the lockless multi-writer design, but will also talk about the serialized CPU approach from RFCv1. John Ogness [0] https://www.linuxplumbersconf.org/event/4/contributions/290/
Re: [RFC PATCH v4 0/9] printk: new ringbuffer implementation
[ Added Ted and Linux Plumbers ] On Thu, 5 Sep 2019 17:38:21 +0200 (CEST) Thomas Gleixner wrote: > On Thu, 5 Sep 2019, Peter Zijlstra wrote: > > On Thu, Sep 05, 2019 at 03:05:13PM +0200, Petr Mladek wrote: > > > The alternative lockless approach is still more complicated than > > > the serialized one. But I think that it is manageable thanks to > > > the simplified state tracking. And I might safe use some pain > > > in the long term. > > > > I've not looked at it yet, sorry. But per the above argument of needing > > the CPU serialization _anyway_, I don't see a compelling reason not to > > use it. > > > > It is simple, it works. Let's use it. > > > > If you really fancy a multi-writer buffer, you can always switch to one > > later, if you can convince someone it actually brings benefits and not > > just head-aches. > > Can we please grab one of the TBD slots at kernel summit next week, sit > down in a room and hash that out? > We should definitely be able to find a room that will be available next week. -- Steve
Re: [RFC PATCH v4 0/9] printk: new ringbuffer implementation
On Thu, 5 Sep 2019, Peter Zijlstra wrote: > On Thu, Sep 05, 2019 at 03:05:13PM +0200, Petr Mladek wrote: > > The alternative lockless approach is still more complicated than > > the serialized one. But I think that it is manageable thanks to > > the simplified state tracking. And I might safe use some pain > > in the long term. > > I've not looked at it yet, sorry. But per the above argument of needing > the CPU serialization _anyway_, I don't see a compelling reason not to > use it. > > It is simple, it works. Let's use it. > > If you really fancy a multi-writer buffer, you can always switch to one > later, if you can convince someone it actually brings benefits and not > just head-aches. Can we please grab one of the TBD slots at kernel summit next week, sit down in a room and hash that out? Thanks, tglx
Re: [RFC PATCH v4 0/9] printk: new ringbuffer implementation
On Thu, Sep 05, 2019 at 03:05:13PM +0200, Petr Mladek wrote: > The serialized approach used a lock. It was re-entrant and thus less > error-prone but still a lock. > > The lock was planed to be used not only to access the buffer but also > for eventual locking inside lockless consoles. It might allow to > have some synchronization even in lockless consoles. But it > would be big-kernel-lock-like style. It might create yet > another maze of problems. I really don't see your point. All it does is limit buffer writers to a single CPU, and does the same for the atomic/early console output. But it must very much be a leaf lock -- that is, there must not be any locking inside it -- and that is fine, if a console cannot do lockless output, it simply cannot be marked as having an atomic/early console. You've seen the force_earlyprintk patches I use [*], that stuff works and is infinitely better than the current printk trainwreck -- and it uses exactly such serialization -- although I only added it to make the output actually readable. And _that_ is exactly why I propose adding it, you need it _anyway_. So the argument goes like: - synchronous output to lockless consoles (early serial) is mandatory - such output needs to be CPU serialized, otherwise it becomes unreadable garbage. - since we need that serialization anyway, might as well lift it up one layer an put it around the buffer. Since a single-cpu buffer writer can be wait free (and relatively simple), the only possible waiting is on the lockless console (polling until the UART is ready for it's next byte). There is nothing else. It will make progress. > If we remove per-CPU buffers in NMI. We would need to synchronize > again printing backtraces from all CPUs. Otherwise they would get > mixed and hard to read. It might be solved by some prefix and > sorting in userspace but... It must have cpu prefixes anyway; the multi-writer thing will equally mix them together. This is a complete non sequitur. That current printk stuff is just pure and utter crap. Those NMI buffers are a trainwreck and need to die a horrible death. > I agree that this lockless variant is really complicated. I am not > able to prove that it is race free as it is now. I understand > the algorithm. But there are too many synchronization points. > > Peter, have you seen my alternative approach, please. See > https://lore.kernel.org/lkml/20190704103321.10022-1-pmla...@suse.com/ > > It uses two tricks: > >1. Two bits in the sequence number are used to track the state > of the related data. It allows to implement the entire > life cycle of each entry using atomic operation on a single > variable. > >2. There is a helper function to read valid data for each entry, > see prb_read_desc(). It checks the state before and after > reading the data to make sure that they are valid. And > it includes the needed read barriers. As a result there > are only three explicit barriers in the code. All other > are implicitly done by cmpxchg() atomic operations. > > The alternative lockless approach is still more complicated than > the serialized one. But I think that it is manageable thanks to > the simplified state tracking. And I might safe use some pain > in the long term. I've not looked at it yet, sorry. But per the above argument of needing the CPU serialization _anyway_, I don't see a compelling reason not to use it. It is simple, it works. Let's use it. If you really fancy a multi-writer buffer, you can always switch to one later, if you can convince someone it actually brings benefits and not just head-aches. So I have something roughly like the below; I'm suggesting you add the line with + on: int early_vprintk(const char *fmt, va_list args) { char buf[256]; // teh suck! int old, n = vscnprintf(buf, sizeof(buf), fmt, args); old = cpu_lock(); + printk_buffer_store(buf, n); early_console->write(early_console, buf, n); cpu_unlock(old); return n; } (yes, yes, we can get rid of the on-stack @buf thing with a reserve+commit API, but who cares :-)) [*] git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git debug/experimental
Re: [RFC PATCH v4 0/9] printk: new ringbuffer implementation
On Wed 2019-09-04 14:35:31, Peter Zijlstra wrote: > On Thu, Aug 08, 2019 at 12:32:25AM +0206, John Ogness wrote: > > Hello, > > > > This is a follow-up RFC on the work to re-implement much of > > the core of printk. The threads for the previous RFC versions > > are here: v1[0], v2[1], v3[2]. > > > > This series only builds upon v3 (i.e. the first part of this > > series is exactly v3). The main purpose of this series is to > > replace the current printk ringbuffer with the new > > ringbuffer. As was discussed[3], this is a conservative > > first step to rework printk. For example, all logbuf_lock > > usage is kept even though the new ringbuffer does not > > require it. This avoids any side-effect bugs in case the > > logbuf_lock is (unintentionally) synchronizing more than > > just the ringbuffer. However, this also means that the > > series does not bring any improvements, just swapping out > > implementations. A future patch will remove the logbuf_lock. > > So after reading most of the first patch (and it look _much_ better than > previous times), I'm left wondering *why* ?! > > That is, why do we need this complexity, as compared to that > CPU serialized approach? The serialized approach used a lock. It was re-entrant and thus less error-prone but still a lock. The lock was planed to be used not only to access the buffer but also for eventual locking inside lockless consoles. It might allow to have some synchronization even in lockless consoles. But it would be big-kernel-lock-like style. It might create yet another maze of problems. If we remove per-CPU buffers in NMI. We would need to synchronize again printing backtraces from all CPUs. Otherwise they would get mixed and hard to read. It might be solved by some prefix and sorting in userspace but... This why I asked to see a fully lockless code to see how more complicated it was. John told me that he had an early version of it around. I agree that this lockless variant is really complicated. I am not able to prove that it is race free as it is now. I understand the algorithm. But there are too many synchronization points. Peter, have you seen my alternative approach, please. See https://lore.kernel.org/lkml/20190704103321.10022-1-pmla...@suse.com/ It uses two tricks: 1. Two bits in the sequence number are used to track the state of the related data. It allows to implement the entire life cycle of each entry using atomic operation on a single variable. 2. There is a helper function to read valid data for each entry, see prb_read_desc(). It checks the state before and after reading the data to make sure that they are valid. And it includes the needed read barriers. As a result there are only three explicit barriers in the code. All other are implicitly done by cmpxchg() atomic operations. The alternative lockless approach is still more complicated than the serialized one. But I think that it is manageable thanks to the simplified state tracking. And I might safe use some pain in the long term. > In my book simpler is better here. printk() is an absolute utter slow > path anyway, nobody cares about the performance much, and I'm thinking > that it should be plenty fast enough as long as you don't run a > synchronous serial output (which is exactly what I do do/require > anyway). I fully agree. Best Regards, Petr
Re: [RFC PATCH v4 0/9] printk: new ringbuffer implementation
On Thu, Aug 08, 2019 at 12:32:25AM +0206, John Ogness wrote: > Hello, > > This is a follow-up RFC on the work to re-implement much of > the core of printk. The threads for the previous RFC versions > are here: v1[0], v2[1], v3[2]. > > This series only builds upon v3 (i.e. the first part of this > series is exactly v3). The main purpose of this series is to > replace the current printk ringbuffer with the new > ringbuffer. As was discussed[3], this is a conservative > first step to rework printk. For example, all logbuf_lock > usage is kept even though the new ringbuffer does not > require it. This avoids any side-effect bugs in case the > logbuf_lock is (unintentionally) synchronizing more than > just the ringbuffer. However, this also means that the > series does not bring any improvements, just swapping out > implementations. A future patch will remove the logbuf_lock. So after reading most of the first patch (and it look _much_ better than previous times), I'm left wondering *why* ?! That is, why do we need this complexity, as compared to that CPU serialized approach? What do we hope to gain by doing a multi-writer buffer? Yes, it is awesome, but from where I'm sitting it is also completely silly, because we'll want to CPU serialize the serial console anyway (otherwise it gets to be a completely unreadable mess). By having the whole thing CPU serialized we looose multi-writer and consequently the buffer gets to be significantly simpler (as you know; because ISTR you've actually done this before -- but I cannot find here why that didn't live). In my book simpler is better here. printk() is an absolute utter slow path anyway, nobody cares about the performance much, and I'm thinking that it should be plenty fast enough as long as you don't run a synchronous serial output (which is exactly what I do do/require anyway). So can we have a few words to explain why we need multi-writer and all this complexity?
[RFC PATCH v4 0/9] printk: new ringbuffer implementation
Hello, This is a follow-up RFC on the work to re-implement much of the core of printk. The threads for the previous RFC versions are here: v1[0], v2[1], v3[2]. This series only builds upon v3 (i.e. the first part of this series is exactly v3). The main purpose of this series is to replace the current printk ringbuffer with the new ringbuffer. As was discussed[3], this is a conservative first step to rework printk. For example, all logbuf_lock usage is kept even though the new ringbuffer does not require it. This avoids any side-effect bugs in case the logbuf_lock is (unintentionally) synchronizing more than just the ringbuffer. However, this also means that the series does not bring any improvements, just swapping out implementations. A future patch will remove the logbuf_lock. Except for the test module (patches 2 and 6), the rest may already be interesting for mainline as is. I have tested the various interfaces (console, /dev/kmsg, syslog, kmsg_dump) and their features and all looks good AFAICT. The patches can be broken down as follows: 1-2: the previously posted RFCv3 3-7: addresses minor issues from RFCv3 8: adds new high-level ringbuffer functions to support printk (nothing involving new memory barriers) 9: replace the ringbuffer usage in printk.c One important thing to know (as is mentioned in the commit message of patch 9), there are 2 externally visible changes: - vmcore info changes - powerpc powernv/opal memdump of log discontinued I have no idea how acceptable these changes are. I will not be posting any further printk patches until I have received some feedback on this. I appreciate all the help so far. I realize that this is a lot of code to go through. The series is based on 5.3-rc3. I would encourage people to apply the series and give it a run. I expect that you will not notice any difference with your printk behaviour. John Ogness [0] https://lkml.kernel.org/r/20190212143003.48446-1-john.ogn...@linutronix.de [1] https://lkml.kernel.org/r/20190607162349.18199-1-john.ogn...@linutronix.de [2] https://lkml.kernel.org/r/2019072701.11260-1-john.ogn...@linutronix.de [3] https://lkml.kernel.org/r/87y35hn6ih@linutronix.de John Ogness (9): printk-rb: add a new printk ringbuffer implementation printk-rb: add test module printk-rb: fix missing includes/exports printk-rb: initialize new descriptors as invalid printk-rb: remove extra data buffer size allocation printk-rb: adjust test module ringbuffer sizes printk-rb: increase size of seq and size variables printk-rb: new functionality to support printk printk: use a new ringbuffer implementation arch/powerpc/platforms/powernv/opal.c | 22 +- include/linux/kmsg_dump.h |6 +- include/linux/printk.h| 12 - kernel/printk/Makefile|5 + kernel/printk/dataring.c | 809 ++ kernel/printk/dataring.h | 108 +++ kernel/printk/numlist.c | 376 + kernel/printk/numlist.h | 72 ++ kernel/printk/printk.c| 745 + kernel/printk/ringbuffer.c| 1079 + kernel/printk/ringbuffer.h| 354 kernel/printk/test_prb.c | 256 ++ 12 files changed, 3450 insertions(+), 394 deletions(-) create mode 100644 kernel/printk/dataring.c create mode 100644 kernel/printk/dataring.h create mode 100644 kernel/printk/numlist.c create mode 100644 kernel/printk/numlist.h create mode 100644 kernel/printk/ringbuffer.c create mode 100644 kernel/printk/ringbuffer.h create mode 100644 kernel/printk/test_prb.c -- 2.20.1
Re: [RFC PATCH v4 0/9]
On 08/06/2015 04:31 PM, Shawn Lin wrote: > 在 2015/8/6 15:08, Jaehoon Chung 写道: >> Hi, Shawn. >> >> I remembered that Krzysztof has mentioned "Fix the title of cover letter." >> Your cover letter's title is nothing.. "[RFC PATCH v4 0/9] " ?? >> [RFC PATCH v4 0/9] your title... > Sorry, I forgot it, and will fix in next version... No problem :) At next time, add the title at your cover-letter, plz. Best Regards, Jaehoon Chung > >> Best Regards, >> Jaehoon Chung >> >> On 08/06/2015 03:44 PM, Shawn Lin wrote: >>> Add external dma support for Synopsys MSHC >>> >>> Synopsys DesignWare mobile storage host controller supports three >>> types of transfer mode: pio, internal dma and external dma. However, >>> dw_mmc can only supports pio and internal dma now. Thus some platforms >>> using dw-mshc integrated with generic dma can't work in dma mode. So we >>> submit this patch to achieve it. >>> >>> And the config option, CONFIG_MMC_DW_IDMAC, was added by Will Newton >>> (commit:f95f3850) for the first version of dw_mmc and never be touched since >>> then. At that time dt-bindings hadn't been introduced into dw_mmc yet means >>> we should select CONFIG_MMC_DW_IDMAC to enable internal dma mode at compile >>> time. Nowadays, device-tree helps us to support a variety of boards with one >>> kernel. That's why we need to remove it and decide the transfer mode by >>> reading >>> dw_mmc's HCON reg at runtime. >>> >>> This RFC patch needs lots of ACKs. I know it's hard, but it does need >>> someone >>> to make the running. >>> >>> Patch does the following things: >>> - remove CONFIG_MMC_DW_IDMAC config option >>> - add bindings for edmac used by synopsys-dw-mshc >>>at runtime >>> - add edmac support for synopsys-dw-mshc >>> >>> Patch is based on next of git://git.linaro.org/people/ulf.hansson/mmc >>> >>> >>> Changes in v4: >>> - remove "host->trans_mode" and use "host->use_dma" to indicate >>>transfer mode. >>> - remove all bt-bindings' changes since we don't need new properities. >>> - check transfer mode at runtime by reading HCON reg >>> - spilt defconfig changes for each sub-architecture >>> - fix the title of cover letter >>> - reuse some code for reducing code size >>> >>> Changes in v3: >>> - choose transfer mode at runtime >>> - remove all CONFIG_MMC_DW_IDMAC config option >>> - add supports-idmac property for some platforms >>> >>> Changes in v2: >>> - Fix typo of dev_info msg >>> - remove unused dmach from declaration of dw_mci_dma_slave >>> >>> Shawn Lin (9): >>>mmc: dw_mmc: Add external dma interface support >>>Documentation: synopsys-dw-mshc: add bindings for idmac and edmac >>>mips: pistachio_defconfig: remove CONFIG_MMC_DW_IDMAC >>>arc: axs10x_defconfig: remove CONFIG_MMC_DW_IDMAC >>>arm: exynos_defconfig: remove CONFIG_MMC_DW_IDMAC >>>arm: hisi_defconfig: remove CONFIG_MMC_DW_IDMAC >>>arm: lpc18xx_defconfig: remove CONFIG_MMC_DW_IDMAC >>>arm: multi_v7_defconfig: remove CONFIG_MMC_DW_IDMAC >>>arm: zx_defconfig: remove CONFIG_MMC_DW_IDMAC >>> >>> .../devicetree/bindings/mmc/synopsys-dw-mshc.txt | 25 ++ >>> arch/arc/configs/axs101_defconfig | 1 - >>> arch/arc/configs/axs103_defconfig | 1 - >>> arch/arc/configs/axs103_smp_defconfig | 1 - >>> arch/arm/configs/exynos_defconfig | 1 - >>> arch/arm/configs/hisi_defconfig| 1 - >>> arch/arm/configs/lpc18xx_defconfig | 1 - >>> arch/arm/configs/multi_v7_defconfig| 1 - >>> arch/arm/configs/zx_defconfig | 1 - >>> arch/mips/configs/pistachio_defconfig | 1 - >>> drivers/mmc/host/Kconfig | 11 +- >>> drivers/mmc/host/dw_mmc-pltfm.c| 2 + >>> drivers/mmc/host/dw_mmc.c | 258 >>> + >>> include/linux/mmc/dw_mmc.h | 27 ++- >>> 14 files changed, 257 insertions(+), 75 deletions(-) >>> >> >> >> > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v4 0/9]
在 2015/8/6 15:08, Jaehoon Chung 写道: Hi, Shawn. I remembered that Krzysztof has mentioned "Fix the title of cover letter." Your cover letter's title is nothing.. "[RFC PATCH v4 0/9] " ?? [RFC PATCH v4 0/9] your title... Sorry, I forgot it, and will fix in next version... Best Regards, Jaehoon Chung On 08/06/2015 03:44 PM, Shawn Lin wrote: Add external dma support for Synopsys MSHC Synopsys DesignWare mobile storage host controller supports three types of transfer mode: pio, internal dma and external dma. However, dw_mmc can only supports pio and internal dma now. Thus some platforms using dw-mshc integrated with generic dma can't work in dma mode. So we submit this patch to achieve it. And the config option, CONFIG_MMC_DW_IDMAC, was added by Will Newton (commit:f95f3850) for the first version of dw_mmc and never be touched since then. At that time dt-bindings hadn't been introduced into dw_mmc yet means we should select CONFIG_MMC_DW_IDMAC to enable internal dma mode at compile time. Nowadays, device-tree helps us to support a variety of boards with one kernel. That's why we need to remove it and decide the transfer mode by reading dw_mmc's HCON reg at runtime. This RFC patch needs lots of ACKs. I know it's hard, but it does need someone to make the running. Patch does the following things: - remove CONFIG_MMC_DW_IDMAC config option - add bindings for edmac used by synopsys-dw-mshc at runtime - add edmac support for synopsys-dw-mshc Patch is based on next of git://git.linaro.org/people/ulf.hansson/mmc Changes in v4: - remove "host->trans_mode" and use "host->use_dma" to indicate transfer mode. - remove all bt-bindings' changes since we don't need new properities. - check transfer mode at runtime by reading HCON reg - spilt defconfig changes for each sub-architecture - fix the title of cover letter - reuse some code for reducing code size Changes in v3: - choose transfer mode at runtime - remove all CONFIG_MMC_DW_IDMAC config option - add supports-idmac property for some platforms Changes in v2: - Fix typo of dev_info msg - remove unused dmach from declaration of dw_mci_dma_slave Shawn Lin (9): mmc: dw_mmc: Add external dma interface support Documentation: synopsys-dw-mshc: add bindings for idmac and edmac mips: pistachio_defconfig: remove CONFIG_MMC_DW_IDMAC arc: axs10x_defconfig: remove CONFIG_MMC_DW_IDMAC arm: exynos_defconfig: remove CONFIG_MMC_DW_IDMAC arm: hisi_defconfig: remove CONFIG_MMC_DW_IDMAC arm: lpc18xx_defconfig: remove CONFIG_MMC_DW_IDMAC arm: multi_v7_defconfig: remove CONFIG_MMC_DW_IDMAC arm: zx_defconfig: remove CONFIG_MMC_DW_IDMAC .../devicetree/bindings/mmc/synopsys-dw-mshc.txt | 25 ++ arch/arc/configs/axs101_defconfig | 1 - arch/arc/configs/axs103_defconfig | 1 - arch/arc/configs/axs103_smp_defconfig | 1 - arch/arm/configs/exynos_defconfig | 1 - arch/arm/configs/hisi_defconfig| 1 - arch/arm/configs/lpc18xx_defconfig | 1 - arch/arm/configs/multi_v7_defconfig| 1 - arch/arm/configs/zx_defconfig | 1 - arch/mips/configs/pistachio_defconfig | 1 - drivers/mmc/host/Kconfig | 11 +- drivers/mmc/host/dw_mmc-pltfm.c| 2 + drivers/mmc/host/dw_mmc.c | 258 + include/linux/mmc/dw_mmc.h | 27 ++- 14 files changed, 257 insertions(+), 75 deletions(-) -- Shawn Lin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v4 0/9]
Hi, Shawn. I remembered that Krzysztof has mentioned "Fix the title of cover letter." Your cover letter's title is nothing.. "[RFC PATCH v4 0/9] " ?? [RFC PATCH v4 0/9] your title... Best Regards, Jaehoon Chung On 08/06/2015 03:44 PM, Shawn Lin wrote: > Add external dma support for Synopsys MSHC > > Synopsys DesignWare mobile storage host controller supports three > types of transfer mode: pio, internal dma and external dma. However, > dw_mmc can only supports pio and internal dma now. Thus some platforms > using dw-mshc integrated with generic dma can't work in dma mode. So we > submit this patch to achieve it. > > And the config option, CONFIG_MMC_DW_IDMAC, was added by Will Newton > (commit:f95f3850) for the first version of dw_mmc and never be touched since > then. At that time dt-bindings hadn't been introduced into dw_mmc yet means > we should select CONFIG_MMC_DW_IDMAC to enable internal dma mode at compile > time. Nowadays, device-tree helps us to support a variety of boards with one > kernel. That's why we need to remove it and decide the transfer mode by > reading > dw_mmc's HCON reg at runtime. > > This RFC patch needs lots of ACKs. I know it's hard, but it does need someone > to make the running. > > Patch does the following things: > - remove CONFIG_MMC_DW_IDMAC config option > - add bindings for edmac used by synopsys-dw-mshc > at runtime > - add edmac support for synopsys-dw-mshc > > Patch is based on next of git://git.linaro.org/people/ulf.hansson/mmc > > > Changes in v4: > - remove "host->trans_mode" and use "host->use_dma" to indicate > transfer mode. > - remove all bt-bindings' changes since we don't need new properities. > - check transfer mode at runtime by reading HCON reg > - spilt defconfig changes for each sub-architecture > - fix the title of cover letter > - reuse some code for reducing code size > > Changes in v3: > - choose transfer mode at runtime > - remove all CONFIG_MMC_DW_IDMAC config option > - add supports-idmac property for some platforms > > Changes in v2: > - Fix typo of dev_info msg > - remove unused dmach from declaration of dw_mci_dma_slave > > Shawn Lin (9): > mmc: dw_mmc: Add external dma interface support > Documentation: synopsys-dw-mshc: add bindings for idmac and edmac > mips: pistachio_defconfig: remove CONFIG_MMC_DW_IDMAC > arc: axs10x_defconfig: remove CONFIG_MMC_DW_IDMAC > arm: exynos_defconfig: remove CONFIG_MMC_DW_IDMAC > arm: hisi_defconfig: remove CONFIG_MMC_DW_IDMAC > arm: lpc18xx_defconfig: remove CONFIG_MMC_DW_IDMAC > arm: multi_v7_defconfig: remove CONFIG_MMC_DW_IDMAC > arm: zx_defconfig: remove CONFIG_MMC_DW_IDMAC > > .../devicetree/bindings/mmc/synopsys-dw-mshc.txt | 25 ++ > arch/arc/configs/axs101_defconfig | 1 - > arch/arc/configs/axs103_defconfig | 1 - > arch/arc/configs/axs103_smp_defconfig | 1 - > arch/arm/configs/exynos_defconfig | 1 - > arch/arm/configs/hisi_defconfig| 1 - > arch/arm/configs/lpc18xx_defconfig | 1 - > arch/arm/configs/multi_v7_defconfig| 1 - > arch/arm/configs/zx_defconfig | 1 - > arch/mips/configs/pistachio_defconfig | 1 - > drivers/mmc/host/Kconfig | 11 +- > drivers/mmc/host/dw_mmc-pltfm.c| 2 + > drivers/mmc/host/dw_mmc.c | 258 > + > include/linux/mmc/dw_mmc.h | 27 ++- > 14 files changed, 257 insertions(+), 75 deletions(-) > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH v4 0/9]
Add external dma support for Synopsys MSHC Synopsys DesignWare mobile storage host controller supports three types of transfer mode: pio, internal dma and external dma. However, dw_mmc can only supports pio and internal dma now. Thus some platforms using dw-mshc integrated with generic dma can't work in dma mode. So we submit this patch to achieve it. And the config option, CONFIG_MMC_DW_IDMAC, was added by Will Newton (commit:f95f3850) for the first version of dw_mmc and never be touched since then. At that time dt-bindings hadn't been introduced into dw_mmc yet means we should select CONFIG_MMC_DW_IDMAC to enable internal dma mode at compile time. Nowadays, device-tree helps us to support a variety of boards with one kernel. That's why we need to remove it and decide the transfer mode by reading dw_mmc's HCON reg at runtime. This RFC patch needs lots of ACKs. I know it's hard, but it does need someone to make the running. Patch does the following things: - remove CONFIG_MMC_DW_IDMAC config option - add bindings for edmac used by synopsys-dw-mshc at runtime - add edmac support for synopsys-dw-mshc Patch is based on next of git://git.linaro.org/people/ulf.hansson/mmc Changes in v4: - remove "host->trans_mode" and use "host->use_dma" to indicate transfer mode. - remove all bt-bindings' changes since we don't need new properities. - check transfer mode at runtime by reading HCON reg - spilt defconfig changes for each sub-architecture - fix the title of cover letter - reuse some code for reducing code size Changes in v3: - choose transfer mode at runtime - remove all CONFIG_MMC_DW_IDMAC config option - add supports-idmac property for some platforms Changes in v2: - Fix typo of dev_info msg - remove unused dmach from declaration of dw_mci_dma_slave Shawn Lin (9): mmc: dw_mmc: Add external dma interface support Documentation: synopsys-dw-mshc: add bindings for idmac and edmac mips: pistachio_defconfig: remove CONFIG_MMC_DW_IDMAC arc: axs10x_defconfig: remove CONFIG_MMC_DW_IDMAC arm: exynos_defconfig: remove CONFIG_MMC_DW_IDMAC arm: hisi_defconfig: remove CONFIG_MMC_DW_IDMAC arm: lpc18xx_defconfig: remove CONFIG_MMC_DW_IDMAC arm: multi_v7_defconfig: remove CONFIG_MMC_DW_IDMAC arm: zx_defconfig: remove CONFIG_MMC_DW_IDMAC .../devicetree/bindings/mmc/synopsys-dw-mshc.txt | 25 ++ arch/arc/configs/axs101_defconfig | 1 - arch/arc/configs/axs103_defconfig | 1 - arch/arc/configs/axs103_smp_defconfig | 1 - arch/arm/configs/exynos_defconfig | 1 - arch/arm/configs/hisi_defconfig| 1 - arch/arm/configs/lpc18xx_defconfig | 1 - arch/arm/configs/multi_v7_defconfig| 1 - arch/arm/configs/zx_defconfig | 1 - arch/mips/configs/pistachio_defconfig | 1 - drivers/mmc/host/Kconfig | 11 +- drivers/mmc/host/dw_mmc-pltfm.c| 2 + drivers/mmc/host/dw_mmc.c | 258 + include/linux/mmc/dw_mmc.h | 27 ++- 14 files changed, 257 insertions(+), 75 deletions(-) -- 2.3.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH v4 0/9] CPU hotplug: stop_machine()-free CPU hotplug
Hi, This patchset removes CPU hotplug's dependence on stop_machine() from the CPU offline path and provides an alternative (set of APIs) to preempt_disable() to prevent CPUs from going offline, which can be invoked from atomic context. This is an RFC patchset with only a few call-sites of preempt_disable() converted to the new APIs for now, and the main goal is to get feedback on the design of the new atomic APIs and see if it serves as a viable replacement for stop_machine()-free CPU hotplug. A brief description of the algorithm is available in the "Changes in vN" section. Overview of the patches: --- Patch 1 introduces the new APIs that can be used from atomic context, to prevent CPUs from going offline. Patch 2 is a cleanup; it converts preprocessor macros to static inline functions. Patches 3 to 8 convert various call-sites to use the new APIs. Patch 9 is the one which actually removes stop_machine() from the CPU offline path. Changes in v4: -- The synchronization scheme has been simplified quite a bit, which makes it look a lot less complex than before. Some highlights: * Implicit ACKs: The earlier design required the readers to explicitly ACK the writer's signal. The new design uses implicit ACKs instead. The reader switching over to rwlock implicitly tells the writer to stop waiting for that reader. * No atomic operations: Since we got rid of explicit ACKs, we no longer have the need for a reader and a writer to update the same counter. So we can get rid of atomic ops too. Changes in v3: -- * Dropped the _light() and _full() variants of the APIs. Provided a single interface: get/put_online_cpus_atomic(). * Completely redesigned the synchronization mechanism again, to make it fast and scalable at the reader-side in the fast-path (when no hotplug writers are active). This new scheme also ensures that there is no possibility of deadlocks due to circular locking dependency. In summary, this provides the scalability and speed of per-cpu rwlocks (without actually using them), while avoiding the downside (deadlock possibilities) which is inherent in any per-cpu locking scheme that is meant to compete with preempt_disable()/enable() in terms of flexibility. The problem with using per-cpu locking to replace preempt_disable()/enable was explained here: https://lkml.org/lkml/2012/12/6/290 Basically we use per-cpu counters (for scalability) when no writers are active, and then switch to global rwlocks (for lock-safety) when a writer becomes active. It is a slightly complex scheme, but it is based on standard principles of distributed algorithms. Changes in v2: - * Completely redesigned the synchronization scheme to avoid using any extra cpumasks. * Provided APIs for 2 types of atomic hotplug readers: "light" (for light-weight) and "full". We wish to have more "light" readers than the "full" ones, to avoid indirectly inducing the "stop_machine effect" without even actually using stop_machine(). And the patches show that it _is_ generally true: 5 patches deal with "light" readers, whereas only 1 patch deals with a "full" reader. Also, the "light" readers happen to be in very hot paths. So it makes a lot of sense to have such a distinction and a corresponding light-weight API. Links to previous versions: v3: https://lkml.org/lkml/2012/12/7/287 v2: https://lkml.org/lkml/2012/12/5/322 v1: https://lkml.org/lkml/2012/12/4/88 Comments and suggestions welcome! -- Paul E. McKenney (1): cpu: No more __stop_machine() in _cpu_down() Srivatsa S. Bhat (8): CPU hotplug: Provide APIs to prevent CPU offline from atomic context CPU hotplug: Convert preprocessor macros to static inline functions smp, cpu hotplug: Fix smp_call_function_*() to prevent CPU offline properly smp, cpu hotplug: Fix on_each_cpu_*() to prevent CPU offline properly sched, cpu hotplug: Use stable online cpus in try_to_wake_up() & select_task_rq() kick_process(), cpu-hotplug: Prevent offlining of target CPU properly yield_to(), cpu-hotplug: Prevent offlining of other CPUs properly kvm, vmx: Add atomic synchronization with CPU Hotplug arch/x86/kvm/vmx.c |8 +- include/linux/cpu.h |8 +- kernel/cpu.c| 206 ++- kernel/sched/core.c | 22 + kernel/smp.c| 63 ++-- 5 files changed, 273 insertions(+), 34 deletions(-) Thanks, Srivatsa S. Bhat IBM Linux Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/