Re: [REPOST PATCH v6 0/4] kgdb: Fix kgdb_roundup_cpus()
Hi, On Fri, Dec 7, 2018 at 9:42 AM Catalin Marinas wrote: > > On Tue, Dec 04, 2018 at 07:38:24PM -0800, Douglas Anderson wrote: > > Douglas Anderson (4): > > kgdb: Remove irq flags from roundup > > kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function() > > kgdb: Don't round up a CPU that failed rounding up before > > kdb: Don't back trace on a cpu that didn't round up > > FWIW, trying these on arm64 (ThunderX2) with CONFIG_KGDB_TESTS_ON_BOOT=y > on top of 4.20-rc5 doesn't boot. It looks like they leave interrupts > disabled when they shouldn't and it trips over the BUG at > mm/vmalloc.c:1380 (called via do_fork -> copy_process). > > Now, I don't think these patches make things worse on arm64 since prior > to them the kgdb boot tests on arm64 were stuck in a loop (RUN > singlestep). Thanks for the report! ...actually, I'd never tried CONFIG_KGDB_TESTS before. ...so I tried them now: A) chromeos-4.19 tree on qcom-sdm845 without this series: booted up OK B) chromeos-4.19 tree on qcom-sdm845 with this series: booted up OK C) v4.20-rc5-90-g30002dd008ed on rockchip-rk3399 (kevin) with this series: booted up OK Example output from B) above: localhost ~ # dmesg | grep kgdbts [2.139814] KGDB: Registered I/O driver kgdbts [2.144582] kgdbts:RUN plant and detach test [2.165333] kgdbts:RUN sw breakpoint test [2.172990] kgdbts:RUN bad memory access test [2.178640] kgdbts:RUN singlestep test 1000 iterations [2.187765] kgdbts:RUN singlestep [0/1000] [2.559596] kgdbts:RUN singlestep [100/1000] [2.931419] kgdbts:RUN singlestep [200/1000] [3.303474] kgdbts:RUN singlestep [300/1000] [3.675121] kgdbts:RUN singlestep [400/1000] [4.046867] kgdbts:RUN singlestep [500/1000] [4.418920] kgdbts:RUN singlestep [600/1000] [4.790824] kgdbts:RUN singlestep [700/1000] [5.162479] kgdbts:RUN singlestep [800/1000] [5.534103] kgdbts:RUN singlestep [900/1000] [5.902299] kgdbts:RUN do_fork for 100 breakpoints [8.463900] KGDB: Unregistered I/O driver kgdbts, debugger disabled ...so I guess I'm a little confused. Either I have a different config than you do or something is special about your machine? NOTE: In general I've never considered "single step" as reliable in kgdb. I mostly use kgdb as "after the fact" crash debugging to analyze local variables / memory / other tasks. If it worked that'd actually be kinda great, but at least when I started using kgdb years ago I learned that it didn't work and stopped trying... -Doug ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH v6 2/4] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()
Hi, On Mon, Dec 3, 2018 at 7:57 AM Daniel Thompson wrote: > > On Tue, Nov 27, 2018 at 09:38:37AM -0800, Douglas Anderson wrote: > > When I had lockdep turned on and dropped into kgdb I got a nice splat > > on my system. Specifically it hit: > > DEBUG_LOCKS_WARN_ON(current->hardirq_context) > > > > Specifically it looked like this: > > sysrq: SysRq : DEBUG > > [ cut here ] > > DEBUG_LOCKS_WARN_ON(current->hardirq_context) > > WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 > > lockdep_hardirqs_on+0xf0/0x160 > > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27 > > pstate: 604003c9 (nZCv DAIF +PAN -UAO) > > pc : lockdep_hardirqs_on+0xf0/0x160 > > ... > > Call trace: > >lockdep_hardirqs_on+0xf0/0x160 > >trace_hardirqs_on+0x188/0x1ac > >kgdb_roundup_cpus+0x14/0x3c > >kgdb_cpu_enter+0x53c/0x5cc > >kgdb_handle_exception+0x180/0x1d4 > >kgdb_compiled_brk_fn+0x30/0x3c > >brk_handler+0x134/0x178 > >do_debug_exception+0xfc/0x178 > >el1_dbg+0x18/0x78 > >kgdb_breakpoint+0x34/0x58 > >sysrq_handle_dbg+0x54/0x5c > >__handle_sysrq+0x114/0x21c > >handle_sysrq+0x30/0x3c > >qcom_geni_serial_isr+0x2dc/0x30c > > ... > > ... > > irq event stamp: ...45 > > hardirqs last enabled at (...44): [...] __do_softirq+0xd8/0x4e4 > > hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130 > > softirqs last enabled at (...42): [...] _local_bh_enable+0x2c/0x34 > > softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100 > > ---[ end trace adf21f830c46e638 ]--- > > > > Looking closely at it, it seems like a really bad idea to be calling > > local_irq_enable() in kgdb_roundup_cpus(). If nothing else that seems > > like it could violate spinlock semantics and cause a deadlock. > > > > Instead, let's use a private csd alongside > > smp_call_function_single_async() to round up the other CPUs. Using > > smp_call_function_single_async() doesn't require interrupts to be > > enabled so we can remove the offending bit of code. > > > > In order to avoid duplicating this across all the architectures that > > use the default kgdb_roundup_cpus(), we'll add a "weak" implementation > > to debug_core.c. > > > > Looking at all the people who previously had copies of this code, > > there were a few variants. I've attempted to keep the variants > > working like they used to. Specifically: > > * For arch/arc we passed NULL to kgdb_nmicallback() instead of > > get_irq_regs(). > > * For arch/mips there was a bit of extra code around > > kgdb_nmicallback() > > > > NOTE: In this patch we will still get into trouble if we try to round > > up a CPU that failed to round up before. We'll try to round it up > > again and potentially hang when we try to grab the csd lock. That's > > not new behavior but we'll still try to do better in a future patch. > > > > Suggested-by: Daniel Thompson > > Signed-off-by: Douglas Anderson > > --- > > > > Changes in v6: > > - Moved smp_call_function_single_async() error check to patch 3. > > > > Changes in v5: > > - Add a comment about get_irq_regs(). > > - get_cpu() => raw_smp_processor_id() in kgdb_roundup_cpus(). > > - for_each_cpu() => for_each_online_cpu() > > - Error check smp_call_function_single_async() > > > > Changes in v4: None > > Changes in v3: > > - No separate init call. > > - Don't round up the CPU that is doing the rounding up. > > - Add "#ifdef CONFIG_SMP" to match the rest of the file. > > - Updated desc saying we don't solve the "failed to roundup" case. > > - Document the ignored parameter. > > > > Changes in v2: > > - Removing irq flags separated from fixing lockdep splat. > > - Don't use smp_call_function (Daniel). > > > > arch/arc/kernel/kgdb.c | 10 ++ > > arch/arm/kernel/kgdb.c | 12 --- > > arch/arm64/kernel/kgdb.c | 12 --- > > arch/hexagon/kernel/kgdb.c | 27 - > > arch/mips/kernel/kgdb.c| 9 + > > arch/powerpc/kernel/kgdb.c | 4 ++-- > > arch/sh/kernel/kgdb.c | 12 --- > > Please could you re-send with the arch maintainers for these platforms > included in the Cc: of the patch description rather than just in the > e-mail header. > > I'd like to make sure arch maintainers have a chance to opt out if they > want to (it's a weak symbol so an arch could chose not to use it). > > Otherwise I shall start testing shortly! OK, I did a repost of v6 with the Cc's and also the Acks I've received so far. There are no code changes in the repost. Please let me know if you have additional comments and thanks for your help. -Doug ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH v5 2/4] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()
Hi, On Mon, Nov 26, 2018 at 9:39 PM kbuild test robot wrote: > > Hi Douglas, > > Thank you for the patch! Yet something to improve: > > [auto build test ERROR on kgdb/kgdb-next] > [also build test ERROR on v4.20-rc4 next-20181126] > [if your patch is applied to the wrong git tree, please drop us a note to > help improve the system] > > url: > https://github.com/0day-ci/linux/commits/Douglas-Anderson/kgdb-Fix-kgdb_roundup_cpus/20181127-115425 > base: https://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb.git > kgdb-next > config: sparc64-allyesconfig (attached as .config) > compiler: sparc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 > reproduce: > wget > https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O > ~/bin/make.cross > chmod +x ~/bin/make.cross > # save the attached .config to linux build tree > GCC_VERSION=7.2.0 make.cross ARCH=sparc64 > > Note: the > linux-review/Douglas-Anderson/kgdb-Fix-kgdb_roundup_cpus/20181127-115425 HEAD > b8d0502e65f2d7a2187baa69870146a6fbf18c9f builds fine. > It only hurts bisectibility. > > All errors (new ones prefixed by >>): > >kernel/debug/debug_core.c: In function 'kgdb_roundup_cpus': > >> kernel/debug/debug_core.c:261:18: error: 'struct debuggerinfo_struct' has > >> no member named 'rounding_up' >kgdb_info[cpu].rounding_up = false; > ^ Oops, I squashed this into the wrong patch. It should have been in patch #3. Sorry about that. I'll send out a quick v6. ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH v4 2/4] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()
Hi, On Wed, Nov 14, 2018 at 2:07 PM Will Deacon wrote: > > > +void __weak kgdb_call_nmi_hook(void *ignored) > > +{ > > + kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs()); > > +} > > I suppose you could pass the cpu as an argument, but it doesn't really > matter. I probably won't change it for now if it doesn't matter... > Also, I think there are cases where the CSD callback can run without > having received an IPI, so we could potentially up passing NULL for the regs > here which probably goes boom. Hrm, good point. This is not a new issue so I'd tend to add it to the TODO list rather than block the series. I'll also add a comment in the code since I'm touching it anyway. Interestingly enough quite a bit of things continue to work just fine even if this is NULL. I simulated setting this to NULL for all CPUs and I could drop into the debugger, type "btc" to backtrace all CPUs, attach to kgdb, etc. ...but when I typed "cpu 1" it went boom. So it seems like parts of kdb use this but definitely not everything. Also interesting is that on MIPS this is always NULL. I have no idea why but my patch series preserves this oddity. Presumably if someone was on a SMP MIPS machine and did "cpu 1" from kdb they'd go boom too. In general kdb has a lot of crufty stuff like this in it. We need to work to get rid of the cruft but one step at a time I think. I've started a kgdb-wishlist: https://bugs.chromium.org/p/chromium/issues/list?q=label%3Akgdb-wishlist ...and this is crbug.com/908723 > > +void __weak kgdb_roundup_cpus(void) > > +{ > > + call_single_data_t *csd; > > + int this_cpu = get_cpu(); > > Do you actually need to disable preemption here? afaict, irqs are already > disabled by the kgdb core. Ah, right. I can just use raw_smp_processor_id(). Done. I didn't try to see if I could use smp_processor_id() since kgdb_call_nmi_hook() already used raw_smp_processor_id(), but I can dig if you wish. > > + int cpu; > > + > > + for_each_cpu(cpu, cpu_online_mask) { > > for_each_online_cpu(cpu) ? Done. > I'm assuming this is serialised wrt CPU hotplug somehow? I doubt it. I can add it to my wishlist (crbug.com/908722) , but I don't think it's something I'm going to try to solve right now and it's definitely not new. I think we need to make some sort of attempt in kgdb_cpu_enter() to stop hotplugging, though we'd have to take into account that we may be entering kgdb in an IRQ context so it might be hard to grab a mutex. We need to account for it there since that function has code like: > while (kgdb_do_roundup && --time_left && >(atomic_read(_in_kgdb) + atomic_read(_in_kgdb)) != >online_cpus) > udelay(1000); ...and that would also be broken if cpus were plugging / unplugging. In general, at least, the worst case would be that we'd either have an extra 1 second delay entering the debugger (because we were waiting for a CPU to respond that's been hotplugged) or we'd enter kgdb without stopping one of the CPUs. Neither of those is ideal but I don't think we'd end up in too bad shape. Oh, but actually, I guess I should probably check the error return of smp_call_function_single_async() and if it returns an error I should unset rounding_up... That would make things behave slightly better and is probably right anyway. Overall: thank you very much for the review and the feedback. Sorry I'm not really fixing everything here. My hope is to move things to a slightly better state but I don't have time to fix everything. Hopefully I can find some more time soon to fix more, or perhaps someone else will. -Doug ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH v2 2/2] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()
Hi, On Sat, Nov 3, 2018 at 3:45 AM Daniel Thompson wrote: > > On Wed, Oct 31, 2018 at 02:41:14PM -0700, Doug Anderson wrote: > > > As mentioned in another part of the thread we can also add robustness > > > by skipping a cpu where csd->flags != 0 (and adding an appropriately > > > large comment regarding why). Doing the check directly is abusing > > > internal knowledge that smp.c normally keeps to itself so an accessor > > > of some kind would be needed. > > > > Sure. I could add smp_async_func_finished() that just looked like: > > > > int smp_async_func_finished(call_single_data_t *csd) > > { > > return !(csd->flags & CSD_FLAG_LOCK); > > } > > > > My understanding of all the mutual exclusion / memory barrier concepts > > employed by smp.c is pretty weak, though. I'm hoping that it's safe > > to just access the structure and check the bit directly. > > > > ...but do you think adding a generic accessor like this is better than > > just keeping track of this in kgdb directly? I could avoid the > > accessor by adding a "rounding_up" member to "struct > > debuggerinfo_struct" and doing something like this in roundup: > > > > /* If it didn't round up last time, don't try again */ > > if (kgdb_info[cpu].rounding_up) > > continue > > > > kgdb_info[cpu].rounding_up = true > > smp_call_function_single_async(cpu, csd); > > > > ...and then in kgdb_nmicallback() I could just add: > > > > kgdb_info[cpu].rounding_up = false > > > > In that case we're not adding a generic accessor to smp.c that most > > people should never use. > > Whilst it is very tempting to make a sarcastic reply here ("Of course! What > kgdb really needs is yet another set of condition variables") I can't > because I actually agree with the proposal. I don't really want kgdb to > be too "special" especially when it doesn't need to be. > > Only thing to note is that rounding_up will not be manipulated within a > common spin lock so you might have to invest a bit of thought to make > sure any races between the master and slave as the slave CPU clears the > flag are benign. OK, so I've hopefully got all this thought through and posted v3. I've put most of my thoughts in the patch descriptions themselves so let's continue the discussion there. -Doug ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH v2 2/2] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()
Hi, On Wed, Oct 31, 2018 at 11:40 AM Daniel Thompson wrote: > > On Tue, Oct 30, 2018 at 03:18:43PM -0700, Douglas Anderson wrote: > > diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c > > index f3cadda45f07..9a3f952de6ed 100644 > > --- a/kernel/debug/debug_core.c > > +++ b/kernel/debug/debug_core.c > > @@ -55,6 +55,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > #include > > @@ -220,6 +221,39 @@ int __weak kgdb_skipexception(int exception, struct > > pt_regs *regs) > > return 0; > > } > > > > +/* > > + * Default (weak) implementation for kgdb_roundup_cpus > > + */ > > + > > +static DEFINE_PER_CPU(call_single_data_t, kgdb_roundup_csd); > > + > > +void __weak kgdb_call_nmi_hook(void *ignored) > > +{ > > + kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs()); > > +} > > + > > +void __weak kgdb_roundup_cpus(void) > > +{ > > + call_single_data_t *csd; > > + int cpu; > > + > > + for_each_cpu(cpu, cpu_online_mask) { > > + csd = _cpu(kgdb_roundup_csd, cpu); > > + smp_call_function_single_async(cpu, csd); > > + } > > smp_call_function() automatically skips the calling CPU but this code does > not. It isn't a hard bug since kgdb_nmicallback() does a re-entrancy > check but I'd still prefer to skip the calling CPU. I'll incorporate this into the next version. > As mentioned in another part of the thread we can also add robustness > by skipping a cpu where csd->flags != 0 (and adding an appropriately > large comment regarding why). Doing the check directly is abusing > internal knowledge that smp.c normally keeps to itself so an accessor > of some kind would be needed. Sure. I could add smp_async_func_finished() that just looked like: int smp_async_func_finished(call_single_data_t *csd) { return !(csd->flags & CSD_FLAG_LOCK); } My understanding of all the mutual exclusion / memory barrier concepts employed by smp.c is pretty weak, though. I'm hoping that it's safe to just access the structure and check the bit directly. ...but do you think adding a generic accessor like this is better than just keeping track of this in kgdb directly? I could avoid the accessor by adding a "rounding_up" member to "struct debuggerinfo_struct" and doing something like this in roundup: /* If it didn't round up last time, don't try again */ if (kgdb_info[cpu].rounding_up) continue kgdb_info[cpu].rounding_up = true smp_call_function_single_async(cpu, csd); ...and then in kgdb_nmicallback() I could just add: kgdb_info[cpu].rounding_up = false In that case we're not adding a generic accessor to smp.c that most people should never use. I'll wait to hear back from you if you think the accessor is OK. It seems like it might be nice not to have to add something to smp.c just for this one use case. > > +} > > + > > +static void kgdb_generic_roundup_init(void) > > +{ > > + call_single_data_t *csd; > > + int cpu; > > + > > + for_each_possible_cpu(cpu) { > > + csd = _cpu(kgdb_roundup_csd, cpu); > > + csd->func = kgdb_call_nmi_hook; > > + } > > +} > > I can't help noticing this code is very similar to kgdb_roundup_cpus. Do > we really gain much from ahead-of-time initializing csd->func? Oh! Right... At first I thought about just trying to put the "csd" on the stack in kgdb_roundup_cpus() but then I realized that it needed to persist past the end of kgdb_roundup_cpus(). ...and once I gave up on the idea of putting it on the stack I decided I needed the init. ...but you're right that I don't really. The only thing I'm initting is the function pointer and it totally wouldn't hurt to just init that over and over again every time kgdb_roundup_cpus() is called. -Doug ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH 7/7] kgdb: Remove irq flags and local_irq_enable/disable from roundup
Hi, On Tue, Oct 30, 2018 at 4:46 AM Daniel Thompson wrote: > > On Mon, Oct 29, 2018 at 11:07:07AM -0700, Douglas Anderson wrote: > > The function kgdb_roundup_cpus() was passed a parameter that was > > documented as: > > > > > the flags that will be used when restoring the interrupts. There is > > > local_irq_save() call before kgdb_roundup_cpus(). > > > > Nobody used those flags. Anyone who wanted to temporarily turn on > > interrupts just did local_irq_enable() and local_irq_disable() without > > looking at them. So we can definitely remove the flags. > > On the whole I'd rather that this change... > > > > Speaking of calling local_irq_enable(), it seems like a bad idea and > > it caused a nice splat on my system with lockdep turned on. > > Specifically it hit: > > DEBUG_LOCKS_WARN_ON(current->hardirq_context) > > ... and fixes for this this were in separate patches. They don't appear > especially related. Agreed that this is cleaner. Done for v2. -Doug ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb
Hi, On Tue, Oct 30, 2018 at 4:56 AM Daniel Thompson wrote: > > On Mon, Oct 29, 2018 at 11:07:00AM -0700, Douglas Anderson wrote: > > Looking back, this is pretty much two series squashed that could be > > treated indepdently. The first is a serial series and the second is a > > kgdb series. > > Indeed. > > I couldn't work out the link between the first 5 patches and the last 2 > until I read this... > > Is anything in the 01->05 patch set even related to kgdb? From the stack > traces it looks to me like the lock dep warning would trigger for any > sysrq. I think separating into two threads for v2 would be sensible. Yes, sorry about the mess. Splitting this into two series makes the most sense. Then I can focus more on trying to get the CCs right and people can just get the patches that matter to them. OK, I've sent v2 of both series out now. Please yell if you can't find them for whatever reason. -Doug -Doug ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH 6/7] smp: Don't yell about IRQs disabled in kgdb_roundup_cpus()
Daniel, On Tue, Oct 30, 2018 at 2:42 AM Daniel Thompson wrote: > > On Mon, Oct 29, 2018 at 11:07:06AM -0700, Douglas Anderson wrote: > > In kgdb_roundup_cpus() we've got code that looks like: > > local_irq_enable(); > > smp_call_function(kgdb_call_nmi_hook, NULL, 0); > > local_irq_disable(); > > > > In certain cases when we drop into kgdb (like with sysrq-g on a serial > > console) we'll get a big yell that looks like: > > > > sysrq: SysRq : DEBUG > > [ cut here ] > > DEBUG_LOCKS_WARN_ON(current->hardirq_context) > > WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 > > lockdep_hardirqs_on+0xf0/0x160 > > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27 > > pstate: 604003c9 (nZCv DAIF +PAN -UAO) > > pc : lockdep_hardirqs_on+0xf0/0x160 > > ... > > Call trace: > >lockdep_hardirqs_on+0xf0/0x160 > >trace_hardirqs_on+0x188/0x1ac > >kgdb_roundup_cpus+0x14/0x3c > >kgdb_cpu_enter+0x53c/0x5cc > >kgdb_handle_exception+0x180/0x1d4 > >kgdb_compiled_brk_fn+0x30/0x3c > >brk_handler+0x134/0x178 > >do_debug_exception+0xfc/0x178 > >el1_dbg+0x18/0x78 > >kgdb_breakpoint+0x34/0x58 > >sysrq_handle_dbg+0x54/0x5c > >__handle_sysrq+0x114/0x21c > >handle_sysrq+0x30/0x3c > >qcom_geni_serial_isr+0x2dc/0x30c > > ... > > ... > > irq event stamp: ...45 > > hardirqs last enabled at (...44): [...] __do_softirq+0xd8/0x4e4 > > hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130 > > softirqs last enabled at (...42): [...] _local_bh_enable+0x2c/0x34 > > softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100 > > ---[ end trace adf21f830c46e638 ]--- > > > > Let's add kgdb to the list of reasons not to warn in > > smp_call_function_many(). That will allow us (in a future patch) to > > stop calling local_irq_enable() which will get rid of the original > > splat. > > > > NOTE: with this change comes the obvious question: will we start > > deadlocking more often now when we drop into the debugger. I can't > > say that for sure one way or the other, but the fact that we do the > > same logic for "oops_in_progress" makes me feel like it shouldn't > > happen too often. Also note that the old logic of turning on > > interrupts temporarily wasn't exactly safe since (I presume) that > > could have violated spin_lock_irqsave() semantics and ended up with a > > deadlock of its own. > > This is part of the code to bring all the cores to a halt and since > the other cores are still running kgdb isn't yet able to use the fact > all the CPUs are halted to bend the rules. It is better for this code > to play by the rules if it can. > > Is is possible to get the roundup functions to use a private csd > alongside smp_call_function_single_async()? We could add a helper > function to the debug core to avoid having to add cpu_online loops into > every kgdb_roundup_cpus() implementaton. Exactly the kind of helpful suggestion I was looking for. Thank you very much! See v2 and hopefully it matches what you were thinking of. -Doug ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH v4] mmc: dw_mmc: Fix the DTO/CTO timeout overflow calculation for 32-bit systems
Hi, On Wed, Feb 28, 2018 at 3:53 AM, Evgeniy Didinwrote: > In commit 9d9491a7da2a ("mmc: dw_mmc: Fix the DTO timeout calculation") and > commit 4c2357f57dd5 ("mmc: dw_mmc: Fix the CTO timeout calculation") > have been made changes which cause multiply overflow for 32-bit systems. > The broken timeout calculations leads to unexpected ETIMEDOUT errors and > causes stacktrace splat (such as below) during normal data exchange > with SD-card. > > | Running : 4M-check-reassembly-tcp-cmykw2-rotatew2.out -v0 -w1 > | - Info: Finished target initialization. > | mmcblk0: error -110 transferring data, sector 320544, nr 2048, cmd response > | 0x900, card status 0x0 > > DIV_ROUND_UP_ULL helps to escape usage of __udivdi3() from libgcc and so > code gets compiled on all 32-bit platforms as opposed to usage > of DIV_ROUND_UP when we may only compile stuff on a very few arches. > > Lets cast this multiply to u64 type which prevents overflow. > > Tested-by: Vineet Gupta > Reported-by: Vineet Gupta # ARC STAR 9001306872 > HSDK, sdio: board crashes when copying big files > Fixes: 9d9491a7da2a ("mmc: dw_mmc: Fix the DTO timeout calculation") > Fixes: 4c2357f57dd5 ("mmc: dw_mmc: Fix the CTO timeout calculation") > > Signed-off-by: Evgeniy Didin > > CC: Alexey Brodkin > CC: Eugeniy Paltsev > CC: Douglas Anderson > CC: Ulf Hansson > CC: Andy Shevchenko > CC: Jisheng Zhang > CC: Shawn Lin > CC: Vineet Gupta > CC: linux-ker...@vger.kernel.org > CC: linux-snps-arc@lists.infradead.org > Cc: > --- > Changes since v3: > -Switch DIV_ROUND_UP macro to DIV_ROUND_UP_ULL > -Make one patch from two patches > -Modify commit message > > Changes sinve v2: > -add fix for cto_ms > > Changes since v1: > -uint64_t switched to u64 > > drivers/mmc/host/dw_mmc.c | 9 ++--- > 1 file changed, 6 insertions(+), 3 deletions(-) FYI that your messages keep ending up in my spam folder and I just see other people's replies. Not exactly sure why... Note also that your patch really doesn't have all the tags in the right places. All the Tags including the Signed-off-by and Cc are supposed to be squashed in one paragraph ...and I believe "CC" is canonically supposed to be "Cc". Presumably the maintainer can fix this up when applying, but please try to do better in the future. Speaking of which, you didn't include the dw_mmc maintainer (Jaehoon Chung) in your post? That seems not so great unless you know something that get_maintainers doesn't know about who will apply this patch. $ ./scripts/get_maintainer.pl -f drivers/mmc/host/dw_mmc.c Jaehoon Chung (maintainer:SYNOPSYS DESIGNWARE MMC/SD/SDIO DRIVER) As far as the actual patch, at first I thought maybe we could avoid the 64-bit math and just pre-divide bus_hz by MSEC_PER_SEC (just like used to happen before my change), but I think there's still a chance of overflow, right? Because, for instance: drto_clks can be a full 24 bits drto_div can be effectively 9 bits ...we could do further tricks (further rounding up the result), but it seems like just doing the 64-bit math is fine too unless someone has any strong objections... Reviewed-by: Douglas Anderson ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH 0/5] earlycon hang under some conditions
Hi, On Mon, Jul 17, 2017 at 11:02 PM, Jeffy Chenwrote: > I was testing earlycon with 8250 dw serial console. And it hangs in > these cases: > 1/ kernel hang when calling early write function after free_initmem: > a) the earlycon not disabled after the init code(due to keep_bootcon or >not specify a real console to switch to) > b) the early write func is marked as __init, for example 8250_early. > > 2/ kernel hang when calling early write function after disable unused > clks/pm domain: > a) the earlycon not disabled after the init code > b) the disable unused clks/pm domain kill the requiered clks/pm >domain, since they are not referenced by the earlycon. > > 3/ kernel hang when calling early write function after the serial >console driver runtime suspended: > a) the earlycon not disabled after the init code > b) the serial console driver's runtime suspend kills the requiered >clks/pm domain, since they are not referenced by the earlycon. > > This serial fix 1/ case only. > > > > Jeffy Chen (5): > serial: arc: Remove __init marking from early write > serial: omap: Remove __init marking from early write > serial: xuartps: Remove __init marking from early write > serial: 8250_ingenic: Remove __init marking from early write > serial: 8250_early: Remove __init marking from early write > > drivers/tty/serial/8250/8250_early.c | 8 > drivers/tty/serial/8250/8250_ingenic.c | 8 > drivers/tty/serial/arc_uart.c | 4 ++-- > drivers/tty/serial/omap-serial.c | 13 ++--- > drivers/tty/serial/xilinx_uartps.c | 2 +- > 5 files changed, 17 insertions(+), 18 deletions(-) I looked through all 5 patches and they seem sane to me. I didn't go through and test them since Brian already tested the patches for the one UART driver I'd have access to anyway. As per my previous email, I don't see any need to solve all the world'd problems with this patch series, plus the keep_bootcon seems to be an experts / debugging type option and it seems sane if you need to take care in using it. So officially for the series (FWIW since I'm just an interested 3rd party): Reviewed-by: Douglas Anderson -Doug ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH 0/5] earlycon hang under some conditions
Hi, On Tue, Jul 18, 2017 at 10:51 AM, Brian Norriswrote: > Hi Jeffy, > > On Tue, Jul 18, 2017 at 02:02:53PM +0800, Jeffy Chen wrote: >> I was testing earlycon with 8250 dw serial console. And it hangs in >> these cases: >> 1/ kernel hang when calling early write function after free_initmem: >> a) the earlycon not disabled after the init code(due to keep_bootcon or >>not specify a real console to switch to) >> b) the early write func is marked as __init, for example 8250_early. > > FWIW, I tested 8250_early with "keep_bootcon", and this fixes that: > > Tested-by: Brian Norris > > But then I get double console, if I have both a "real" and "early" > console. > > If I were to *only* have the early console, then I might hit the > problems you mention: > >> 2/ kernel hang when calling early write function after disable unused >> clks/pm domain: >> a) the earlycon not disabled after the init code >> b) the disable unused clks/pm domain kill the requiered clks/pm >>domain, since they are not referenced by the earlycon. >> >> 3/ kernel hang when calling early write function after the serial >>console driver runtime suspended: >> a) the earlycon not disabled after the init code >> b) the serial console driver's runtime suspend kills the requiered >>clks/pm domain, since they are not referenced by the earlycon. >> >> This serial fix 1/ case only. > > Problems 2 and 3 look like something that's not really in scope for > early consoles. There's a reason they are mainly supported for early > boot, and we try to switch off of them. There isn't really a good way to > handle all the clock and PM infrastructure without...switching off the > earlycon and using the real one :) > > So, I guess this patchset has value for systems where the clock/PM > requirements are simple enough, and the earlycon can actually be useful > beyond early init. Presumably someone using this would just use the "clk_ignore_unused" and "pd_ignore_unused" kernel parameters to help them... In this case that seems fine to me. The "early" boot console is early because it can't rely on all of the generic OS mechanisms to do things super cleanly, so requiring the user to know that they should add the extra command line arguments seems sane. -Doug ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc