Re: [REPOST PATCH v6 0/4] kgdb: Fix kgdb_roundup_cpus()

2018-12-07 Thread Doug Anderson
Hi,

On Fri, Dec 7, 2018 at 9:42 AM Catalin Marinas  wrote:
>
> On Tue, Dec 04, 2018 at 07:38:24PM -0800, Douglas Anderson wrote:
> > Douglas Anderson (4):
> >   kgdb: Remove irq flags from roundup
> >   kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()
> >   kgdb: Don't round up a CPU that failed rounding up before
> >   kdb: Don't back trace on a cpu that didn't round up
>
> FWIW, trying these on arm64 (ThunderX2) with CONFIG_KGDB_TESTS_ON_BOOT=y
> on top of 4.20-rc5 doesn't boot. It looks like they leave interrupts
> disabled when they shouldn't and it trips over the BUG at
> mm/vmalloc.c:1380 (called via do_fork -> copy_process).
>
> Now, I don't think these patches make things worse on arm64 since prior
> to them the kgdb boot tests on arm64 were stuck in a loop (RUN
> singlestep).

Thanks for the report!  ...actually, I'd never tried CONFIG_KGDB_TESTS
before.  ...so I tried them now:

A) chromeos-4.19 tree on qcom-sdm845 without this series: booted up OK
B) chromeos-4.19 tree on qcom-sdm845 with this series: booted up OK
C) v4.20-rc5-90-g30002dd008ed on rockchip-rk3399 (kevin) with this
series: booted up OK

Example output from B) above:

localhost ~ # dmesg | grep kgdbts
[2.139814] KGDB: Registered I/O driver kgdbts
[2.144582] kgdbts:RUN plant and detach test
[2.165333] kgdbts:RUN sw breakpoint test
[2.172990] kgdbts:RUN bad memory access test
[2.178640] kgdbts:RUN singlestep test 1000 iterations
[2.187765] kgdbts:RUN singlestep [0/1000]
[2.559596] kgdbts:RUN singlestep [100/1000]
[2.931419] kgdbts:RUN singlestep [200/1000]
[3.303474] kgdbts:RUN singlestep [300/1000]
[3.675121] kgdbts:RUN singlestep [400/1000]
[4.046867] kgdbts:RUN singlestep [500/1000]
[4.418920] kgdbts:RUN singlestep [600/1000]
[4.790824] kgdbts:RUN singlestep [700/1000]
[5.162479] kgdbts:RUN singlestep [800/1000]
[5.534103] kgdbts:RUN singlestep [900/1000]
[5.902299] kgdbts:RUN do_fork for 100 breakpoints
[8.463900] KGDB: Unregistered I/O driver kgdbts, debugger disabled

...so I guess I'm a little confused.  Either I have a different config
than you do or something is special about your machine?


NOTE: In general I've never considered "single step" as reliable in
kgdb.  I mostly use kgdb as "after the fact" crash debugging to
analyze local variables / memory / other tasks.  If it worked that'd
actually be kinda great, but at least when I started using kgdb years
ago I learned that it didn't work and stopped trying...


-Doug

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v6 2/4] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

2018-12-04 Thread Doug Anderson
Hi,

On Mon, Dec 3, 2018 at 7:57 AM Daniel Thompson
 wrote:
>
> On Tue, Nov 27, 2018 at 09:38:37AM -0800, Douglas Anderson wrote:
> > When I had lockdep turned on and dropped into kgdb I got a nice splat
> > on my system.  Specifically it hit:
> >   DEBUG_LOCKS_WARN_ON(current->hardirq_context)
> >
> > Specifically it looked like this:
> >   sysrq: SysRq : DEBUG
> >   [ cut here ]
> >   DEBUG_LOCKS_WARN_ON(current->hardirq_context)
> >   WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 
> > lockdep_hardirqs_on+0xf0/0x160
> >   CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
> >   pstate: 604003c9 (nZCv DAIF +PAN -UAO)
> >   pc : lockdep_hardirqs_on+0xf0/0x160
> >   ...
> >   Call trace:
> >lockdep_hardirqs_on+0xf0/0x160
> >trace_hardirqs_on+0x188/0x1ac
> >kgdb_roundup_cpus+0x14/0x3c
> >kgdb_cpu_enter+0x53c/0x5cc
> >kgdb_handle_exception+0x180/0x1d4
> >kgdb_compiled_brk_fn+0x30/0x3c
> >brk_handler+0x134/0x178
> >do_debug_exception+0xfc/0x178
> >el1_dbg+0x18/0x78
> >kgdb_breakpoint+0x34/0x58
> >sysrq_handle_dbg+0x54/0x5c
> >__handle_sysrq+0x114/0x21c
> >handle_sysrq+0x30/0x3c
> >qcom_geni_serial_isr+0x2dc/0x30c
> >   ...
> >   ...
> >   irq event stamp: ...45
> >   hardirqs last  enabled at (...44): [...] __do_softirq+0xd8/0x4e4
> >   hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
> >   softirqs last  enabled at (...42): [...] _local_bh_enable+0x2c/0x34
> >   softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
> >   ---[ end trace adf21f830c46e638 ]---
> >
> > Looking closely at it, it seems like a really bad idea to be calling
> > local_irq_enable() in kgdb_roundup_cpus().  If nothing else that seems
> > like it could violate spinlock semantics and cause a deadlock.
> >
> > Instead, let's use a private csd alongside
> > smp_call_function_single_async() to round up the other CPUs.  Using
> > smp_call_function_single_async() doesn't require interrupts to be
> > enabled so we can remove the offending bit of code.
> >
> > In order to avoid duplicating this across all the architectures that
> > use the default kgdb_roundup_cpus(), we'll add a "weak" implementation
> > to debug_core.c.
> >
> > Looking at all the people who previously had copies of this code,
> > there were a few variants.  I've attempted to keep the variants
> > working like they used to.  Specifically:
> > * For arch/arc we passed NULL to kgdb_nmicallback() instead of
> >   get_irq_regs().
> > * For arch/mips there was a bit of extra code around
> >   kgdb_nmicallback()
> >
> > NOTE: In this patch we will still get into trouble if we try to round
> > up a CPU that failed to round up before.  We'll try to round it up
> > again and potentially hang when we try to grab the csd lock.  That's
> > not new behavior but we'll still try to do better in a future patch.
> >
> > Suggested-by: Daniel Thompson 
> > Signed-off-by: Douglas Anderson 
> > ---
> >
> > Changes in v6:
> > - Moved smp_call_function_single_async() error check to patch 3.
> >
> > Changes in v5:
> > - Add a comment about get_irq_regs().
> > - get_cpu() => raw_smp_processor_id() in kgdb_roundup_cpus().
> > - for_each_cpu() => for_each_online_cpu()
> > - Error check smp_call_function_single_async()
> >
> > Changes in v4: None
> > Changes in v3:
> > - No separate init call.
> > - Don't round up the CPU that is doing the rounding up.
> > - Add "#ifdef CONFIG_SMP" to match the rest of the file.
> > - Updated desc saying we don't solve the "failed to roundup" case.
> > - Document the ignored parameter.
> >
> > Changes in v2:
> > - Removing irq flags separated from fixing lockdep splat.
> > - Don't use smp_call_function (Daniel).
> >
> >  arch/arc/kernel/kgdb.c | 10 ++
> >  arch/arm/kernel/kgdb.c | 12 ---
> >  arch/arm64/kernel/kgdb.c   | 12 ---
> >  arch/hexagon/kernel/kgdb.c | 27 -
> >  arch/mips/kernel/kgdb.c|  9 +
> >  arch/powerpc/kernel/kgdb.c |  4 ++--
> >  arch/sh/kernel/kgdb.c  | 12 ---
>
> Please could you re-send with the arch maintainers for these platforms
> included in the Cc: of the patch description rather than just in the
> e-mail header.
>
> I'd like to make sure arch maintainers have a chance to opt out if they
> want to (it's a weak symbol so an arch could chose not to use it).
>
> Otherwise I shall start testing shortly!

OK, I did a repost of v6 with the Cc's and also the Acks I've received
so far.  There are no code changes in the repost.  Please let me know
if you have additional comments and thanks for your help.

-Doug

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v5 2/4] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

2018-11-27 Thread Doug Anderson
Hi,

On Mon, Nov 26, 2018 at 9:39 PM kbuild test robot  wrote:
>
> Hi Douglas,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on kgdb/kgdb-next]
> [also build test ERROR on v4.20-rc4 next-20181126]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
>
> url:
> https://github.com/0day-ci/linux/commits/Douglas-Anderson/kgdb-Fix-kgdb_roundup_cpus/20181127-115425
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb.git 
> kgdb-next
> config: sparc64-allyesconfig (attached as .config)
> compiler: sparc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
> reproduce:
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> GCC_VERSION=7.2.0 make.cross ARCH=sparc64
>
> Note: the 
> linux-review/Douglas-Anderson/kgdb-Fix-kgdb_roundup_cpus/20181127-115425 HEAD 
> b8d0502e65f2d7a2187baa69870146a6fbf18c9f builds fine.
>   It only hurts bisectibility.
>
> All errors (new ones prefixed by >>):
>
>kernel/debug/debug_core.c: In function 'kgdb_roundup_cpus':
> >> kernel/debug/debug_core.c:261:18: error: 'struct debuggerinfo_struct' has 
> >> no member named 'rounding_up'
>kgdb_info[cpu].rounding_up = false;
>  ^

Oops, I squashed this into the wrong patch.  It should have been in
patch #3.  Sorry about that.  I'll send out a quick v6.

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v4 2/4] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

2018-11-26 Thread Doug Anderson
Hi,

On Wed, Nov 14, 2018 at 2:07 PM Will Deacon  wrote:
>
> > +void __weak kgdb_call_nmi_hook(void *ignored)
> > +{
> > + kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
> > +}
>
> I suppose you could pass the cpu as an argument, but it doesn't really
> matter.

I probably won't change it for now if it doesn't matter...


> Also, I think there are cases where the CSD callback can run without
> having received an IPI, so we could potentially up passing NULL for the regs
> here which probably goes boom.

Hrm, good point.  This is not a new issue so I'd tend to add it to the
TODO list rather than block the series.  I'll also add a comment in
the code since I'm touching it anyway.

Interestingly enough quite a bit of things continue to work just fine
even if this is NULL.  I simulated setting this to NULL for all CPUs
and I could drop into the debugger, type "btc" to backtrace all CPUs,
attach to kgdb, etc.  ...but when I typed "cpu 1" it went boom.  So it
seems like parts of kdb use this but definitely not everything.

Also interesting is that on MIPS this is always NULL.  I have no idea
why but my patch series preserves this oddity.  Presumably if someone
was on a SMP MIPS machine and did "cpu 1" from kdb they'd go boom too.

In general kdb has a lot of crufty stuff like this in it.  We need to
work to get rid of the cruft but one step at a time I think.

I've started a kgdb-wishlist:

https://bugs.chromium.org/p/chromium/issues/list?q=label%3Akgdb-wishlist

...and this is crbug.com/908723


> > +void __weak kgdb_roundup_cpus(void)
> > +{
> > + call_single_data_t *csd;
> > + int this_cpu = get_cpu();
>
> Do you actually need to disable preemption here? afaict, irqs are already
> disabled by the kgdb core.

Ah, right.  I can just use raw_smp_processor_id().  Done.  I didn't
try to see if I could use smp_processor_id() since
kgdb_call_nmi_hook() already used raw_smp_processor_id(), but I can
dig if you wish.


> > + int cpu;
> > +
> > + for_each_cpu(cpu, cpu_online_mask) {
>
> for_each_online_cpu(cpu) ?

Done.


> I'm assuming this is serialised wrt CPU hotplug somehow?

I doubt it.  I can add it to my wishlist (crbug.com/908722) , but I
don't think it's something I'm going to try to solve right now and
it's definitely not new.  I think we need to make some sort of attempt
in kgdb_cpu_enter() to stop hotplugging, though we'd have to take into
account that we may be entering kgdb in an IRQ context so it might be
hard to grab a mutex.  We need to account for it there since that
function has code like:

> while (kgdb_do_roundup && --time_left &&
>(atomic_read(_in_kgdb) + atomic_read(_in_kgdb)) !=
>online_cpus)
> udelay(1000);

...and that would also be broken if cpus were plugging / unplugging.

In general, at least, the worst case would be that we'd either have an
extra 1 second delay entering the debugger (because we were waiting
for a CPU to respond that's been hotplugged) or we'd enter kgdb
without stopping one of the CPUs.  Neither of those is ideal but I
don't think we'd end up in too bad shape.

Oh, but actually, I guess I should probably check the error return of
smp_call_function_single_async() and if it returns an error I should
unset rounding_up...  That would make things behave slightly better
and is probably right anyway.


Overall: thank you very much for the review and the feedback.  Sorry
I'm not really fixing everything here.  My hope is to move things to a
slightly better state but I don't have time to fix everything.
Hopefully I can find some more time soon to fix more, or perhaps
someone else will.


-Doug

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v2 2/2] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

2018-11-06 Thread Doug Anderson
Hi,

On Sat, Nov 3, 2018 at 3:45 AM Daniel Thompson
 wrote:
>
> On Wed, Oct 31, 2018 at 02:41:14PM -0700, Doug Anderson wrote:
> > > As mentioned in another part of the thread we can also add robustness
> > > by skipping a cpu where csd->flags != 0 (and adding an appropriately
> > > large comment regarding why). Doing the check directly is abusing
> > > internal knowledge that smp.c normally keeps to itself so an accessor
> > > of some kind would be needed.
> >
> > Sure.  I could add smp_async_func_finished() that just looked like:
> >
> > int smp_async_func_finished(call_single_data_t *csd)
> > {
> >   return !(csd->flags & CSD_FLAG_LOCK);
> > }
> >
> > My understanding of all the mutual exclusion / memory barrier concepts
> > employed by smp.c is pretty weak, though.  I'm hoping that it's safe
> > to just access the structure and check the bit directly.
> >
> > ...but do you think adding a generic accessor like this is better than
> > just keeping track of this in kgdb directly?  I could avoid the
> > accessor by adding a "rounding_up" member to "struct
> > debuggerinfo_struct" and doing something like this in roundup:
> >
> >   /* If it didn't round up last time, don't try again */
> >   if (kgdb_info[cpu].rounding_up)
> > continue
> >
> >   kgdb_info[cpu].rounding_up = true
> >   smp_call_function_single_async(cpu, csd);
> >
> > ...and then in kgdb_nmicallback() I could just add:
> >
> >   kgdb_info[cpu].rounding_up = false
> >
> > In that case we're not adding a generic accessor to smp.c that most
> > people should never use.
>
> Whilst it is very tempting to make a sarcastic reply here ("Of course! What
> kgdb really needs is yet another set of condition variables") I can't
> because I actually agree with the proposal. I don't really want kgdb to
> be too "special" especially when it doesn't need to be.
>
> Only thing to note is that rounding_up will not be manipulated within a
> common spin lock so you might have to invest a bit of thought to make
> sure any races between the master and slave as the slave CPU clears the
> flag are benign.

OK, so I've hopefully got all this thought through and posted v3.
I've put most of my thoughts in the patch descriptions themselves so
let's continue the discussion there.

-Doug

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v2 2/2] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

2018-10-31 Thread Doug Anderson
Hi,

On Wed, Oct 31, 2018 at 11:40 AM Daniel Thompson
 wrote:
>
> On Tue, Oct 30, 2018 at 03:18:43PM -0700, Douglas Anderson wrote:
> > diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> > index f3cadda45f07..9a3f952de6ed 100644
> > --- a/kernel/debug/debug_core.c
> > +++ b/kernel/debug/debug_core.c
> > @@ -55,6 +55,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #include 
> >  #include 
> > @@ -220,6 +221,39 @@ int __weak kgdb_skipexception(int exception, struct 
> > pt_regs *regs)
> >   return 0;
> >  }
> >
> > +/*
> > + * Default (weak) implementation for kgdb_roundup_cpus
> > + */
> > +
> > +static DEFINE_PER_CPU(call_single_data_t, kgdb_roundup_csd);
> > +
> > +void __weak kgdb_call_nmi_hook(void *ignored)
> > +{
> > + kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
> > +}
> > +
> > +void __weak kgdb_roundup_cpus(void)
> > +{
> > + call_single_data_t *csd;
> > + int cpu;
> > +
> > + for_each_cpu(cpu, cpu_online_mask) {
> > + csd = _cpu(kgdb_roundup_csd, cpu);
> > + smp_call_function_single_async(cpu, csd);
> > + }
>
> smp_call_function() automatically skips the calling CPU but this code does
> not. It isn't a hard bug since kgdb_nmicallback() does a re-entrancy
> check but I'd still prefer to skip the calling CPU.

I'll incorporate this into the next version.


> As mentioned in another part of the thread we can also add robustness
> by skipping a cpu where csd->flags != 0 (and adding an appropriately
> large comment regarding why). Doing the check directly is abusing
> internal knowledge that smp.c normally keeps to itself so an accessor
> of some kind would be needed.

Sure.  I could add smp_async_func_finished() that just looked like:

int smp_async_func_finished(call_single_data_t *csd)
{
  return !(csd->flags & CSD_FLAG_LOCK);
}

My understanding of all the mutual exclusion / memory barrier concepts
employed by smp.c is pretty weak, though.  I'm hoping that it's safe
to just access the structure and check the bit directly.

...but do you think adding a generic accessor like this is better than
just keeping track of this in kgdb directly?  I could avoid the
accessor by adding a "rounding_up" member to "struct
debuggerinfo_struct" and doing something like this in roundup:

  /* If it didn't round up last time, don't try again */
  if (kgdb_info[cpu].rounding_up)
continue

  kgdb_info[cpu].rounding_up = true
  smp_call_function_single_async(cpu, csd);

...and then in kgdb_nmicallback() I could just add:

  kgdb_info[cpu].rounding_up = false

In that case we're not adding a generic accessor to smp.c that most
people should never use.


I'll wait to hear back from you if you think the accessor is OK.  It
seems like it might be nice not to have to add something to smp.c just
for this one use case.


> > +}
> > +
> > +static void kgdb_generic_roundup_init(void)
> > +{
> > + call_single_data_t *csd;
> > + int cpu;
> > +
> > + for_each_possible_cpu(cpu) {
> > + csd = _cpu(kgdb_roundup_csd, cpu);
> > + csd->func = kgdb_call_nmi_hook;
> > + }
> > +}
>
> I can't help noticing this code is very similar to kgdb_roundup_cpus. Do
> we really gain much from ahead-of-time initializing csd->func?

Oh!  Right...  At first I thought about just trying to put the "csd"
on the stack in kgdb_roundup_cpus() but then I realized that it needed
to persist past the end of kgdb_roundup_cpus().  ...and once I gave up
on the idea of putting it on the stack I decided I needed the init.

...but you're right that I don't really.  The only thing I'm initting
is the function pointer and it totally wouldn't hurt to just init that
over and over again every time kgdb_roundup_cpus() is called.

-Doug

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 7/7] kgdb: Remove irq flags and local_irq_enable/disable from roundup

2018-10-30 Thread Doug Anderson
Hi,

On Tue, Oct 30, 2018 at 4:46 AM Daniel Thompson
 wrote:
>
> On Mon, Oct 29, 2018 at 11:07:07AM -0700, Douglas Anderson wrote:
> > The function kgdb_roundup_cpus() was passed a parameter that was
> > documented as:
> >
> > > the flags that will be used when restoring the interrupts. There is
> > > local_irq_save() call before kgdb_roundup_cpus().
> >
> > Nobody used those flags.  Anyone who wanted to temporarily turn on
> > interrupts just did local_irq_enable() and local_irq_disable() without
> > looking at them.  So we can definitely remove the flags.
>
> On the whole I'd rather that this change...
>
>
> > Speaking of calling local_irq_enable(), it seems like a bad idea and
> > it caused a nice splat on my system with lockdep turned on.
> > Specifically it hit:
> >   DEBUG_LOCKS_WARN_ON(current->hardirq_context)
>
> ... and fixes for this this were in separate patches. They don't appear
> especially related.

Agreed that this is cleaner.  Done for v2.

-Doug

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 0/7] serial: Finish kgdb on qcom_geni; fix many lockdep splats w/ kgdb

2018-10-30 Thread Doug Anderson
Hi,

On Tue, Oct 30, 2018 at 4:56 AM Daniel Thompson
 wrote:
>
> On Mon, Oct 29, 2018 at 11:07:00AM -0700, Douglas Anderson wrote:
> > Looking back, this is pretty much two series squashed that could be
> > treated indepdently.  The first is a serial series and the second is a
> > kgdb series.
>
> Indeed.
>
> I couldn't work out the link between the first 5 patches and the last 2
> until I read this...
>
> Is anything in the 01->05 patch set even related to kgdb? From the stack
> traces it looks to me like the lock dep warning would trigger for any
> sysrq. I think separating into two threads for v2 would be sensible.

Yes, sorry about the mess.  Splitting this into two series makes the
most sense.  Then I can focus more on trying to get the CCs right and
people can just get the patches that matter to them.

OK, I've sent v2 of both series out now.  Please yell if you can't
find them for whatever reason.

-Doug

-Doug

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 6/7] smp: Don't yell about IRQs disabled in kgdb_roundup_cpus()

2018-10-30 Thread Doug Anderson
Daniel,

On Tue, Oct 30, 2018 at 2:42 AM Daniel Thompson
 wrote:
>
> On Mon, Oct 29, 2018 at 11:07:06AM -0700, Douglas Anderson wrote:
> > In kgdb_roundup_cpus() we've got code that looks like:
> >   local_irq_enable();
> >   smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> >   local_irq_disable();
> >
> > In certain cases when we drop into kgdb (like with sysrq-g on a serial
> > console) we'll get a big yell that looks like:
> >
> >   sysrq: SysRq : DEBUG
> >   [ cut here ]
> >   DEBUG_LOCKS_WARN_ON(current->hardirq_context)
> >   WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 
> > lockdep_hardirqs_on+0xf0/0x160
> >   CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
> >   pstate: 604003c9 (nZCv DAIF +PAN -UAO)
> >   pc : lockdep_hardirqs_on+0xf0/0x160
> >   ...
> >   Call trace:
> >lockdep_hardirqs_on+0xf0/0x160
> >trace_hardirqs_on+0x188/0x1ac
> >kgdb_roundup_cpus+0x14/0x3c
> >kgdb_cpu_enter+0x53c/0x5cc
> >kgdb_handle_exception+0x180/0x1d4
> >kgdb_compiled_brk_fn+0x30/0x3c
> >brk_handler+0x134/0x178
> >do_debug_exception+0xfc/0x178
> >el1_dbg+0x18/0x78
> >kgdb_breakpoint+0x34/0x58
> >sysrq_handle_dbg+0x54/0x5c
> >__handle_sysrq+0x114/0x21c
> >handle_sysrq+0x30/0x3c
> >qcom_geni_serial_isr+0x2dc/0x30c
> >   ...
> >   ...
> >   irq event stamp: ...45
> >   hardirqs last  enabled at (...44): [...] __do_softirq+0xd8/0x4e4
> >   hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
> >   softirqs last  enabled at (...42): [...] _local_bh_enable+0x2c/0x34
> >   softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
> >   ---[ end trace adf21f830c46e638 ]---
> >
> > Let's add kgdb to the list of reasons not to warn in
> > smp_call_function_many().  That will allow us (in a future patch) to
> > stop calling local_irq_enable() which will get rid of the original
> > splat.
> >
> > NOTE: with this change comes the obvious question: will we start
> > deadlocking more often now when we drop into the debugger.  I can't
> > say that for sure one way or the other, but the fact that we do the
> > same logic for "oops_in_progress" makes me feel like it shouldn't
> > happen too often.  Also note that the old logic of turning on
> > interrupts temporarily wasn't exactly safe since (I presume) that
> > could have violated spin_lock_irqsave() semantics and ended up with a
> > deadlock of its own.
>
> This is part of the code to bring all the cores to a halt and since
> the other cores are still running kgdb isn't yet able to use the fact
> all the CPUs are halted to bend the rules. It is better for this code
> to play by the rules if it can.
>
> Is is possible to get the roundup functions to use a private csd
> alongside smp_call_function_single_async()? We could add a helper
> function to the debug core to avoid having to add cpu_online loops into
> every kgdb_roundup_cpus() implementaton.

Exactly the kind of helpful suggestion I was looking for.  Thank you
very much!  See v2 and hopefully it matches what you were thinking of.

-Doug

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH v4] mmc: dw_mmc: Fix the DTO/CTO timeout overflow calculation for 32-bit systems

2018-02-28 Thread Doug Anderson
Hi,

On Wed, Feb 28, 2018 at 3:53 AM, Evgeniy Didin
 wrote:
> In commit 9d9491a7da2a ("mmc: dw_mmc: Fix the DTO timeout calculation") and
> commit 4c2357f57dd5 ("mmc: dw_mmc: Fix the CTO timeout calculation")
> have been made changes which cause multiply overflow for 32-bit systems.
> The broken timeout calculations leads to unexpected ETIMEDOUT errors and
> causes stacktrace splat (such as below) during normal data exchange
> with SD-card.
>
> | Running :  4M-check-reassembly-tcp-cmykw2-rotatew2.out -v0 -w1
> | -  Info: Finished target initialization.
> | mmcblk0: error -110 transferring data, sector 320544, nr 2048, cmd response
> |  0x900, card status 0x0
>
> DIV_ROUND_UP_ULL helps to escape usage of __udivdi3() from libgcc and so
> code gets compiled on all 32-bit platforms as opposed to usage
> of DIV_ROUND_UP when we may only compile stuff on a very few arches.
>
> Lets cast this multiply to u64 type which prevents overflow.
>
> Tested-by: Vineet Gupta 
> Reported-by: Vineet Gupta  # ARC STAR 9001306872 
> HSDK, sdio: board crashes when copying big files
> Fixes: 9d9491a7da2a ("mmc: dw_mmc: Fix the DTO timeout calculation")
> Fixes: 4c2357f57dd5 ("mmc: dw_mmc: Fix the CTO timeout calculation")
>
> Signed-off-by: Evgeniy Didin 
>
> CC: Alexey Brodkin 
> CC: Eugeniy Paltsev 
> CC: Douglas Anderson 
> CC: Ulf Hansson 
> CC: Andy Shevchenko 
> CC: Jisheng Zhang 
> CC: Shawn Lin 
> CC: Vineet Gupta 
> CC: linux-ker...@vger.kernel.org
> CC: linux-snps-arc@lists.infradead.org
> Cc: 
> ---
> Changes since v3:
> -Switch DIV_ROUND_UP macro to DIV_ROUND_UP_ULL
> -Make one patch from two patches
> -Modify commit message
>
> Changes sinve v2:
> -add fix for cto_ms
>
> Changes since v1:
> -uint64_t switched to u64
>
>  drivers/mmc/host/dw_mmc.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)

FYI that your messages keep ending up in my spam folder and I just see
other people's replies.  Not exactly sure why...  Note also that your
patch really doesn't have all the tags in the right places.  All the
Tags including the Signed-off-by and Cc are supposed to be squashed in
one paragraph  ...and I believe "CC" is canonically supposed to be
"Cc".  Presumably the maintainer can fix this up when applying, but
please try to do better in the future.  Speaking of which, you didn't
include the dw_mmc maintainer (Jaehoon Chung) in your post?  That
seems not so great unless you know something that get_maintainers
doesn't know about who will apply this patch.

$ ./scripts/get_maintainer.pl -f drivers/mmc/host/dw_mmc.c
Jaehoon Chung  (maintainer:SYNOPSYS DESIGNWARE
MMC/SD/SDIO DRIVER)


As far as the actual patch, at first I thought maybe we could avoid
the 64-bit math and just pre-divide bus_hz by MSEC_PER_SEC (just like
used to happen before my change), but I think there's still a chance
of overflow, right?  Because, for instance:

drto_clks can be a full 24 bits
drto_div can be effectively 9 bits

...we could do further tricks (further rounding up the result), but it
seems like just doing the 64-bit math is fine too unless someone has
any strong objections...

Reviewed-by: Douglas Anderson 

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 0/5] earlycon hang under some conditions

2017-07-24 Thread Doug Anderson
Hi,

On Mon, Jul 17, 2017 at 11:02 PM, Jeffy Chen  wrote:
> I was testing earlycon with 8250 dw serial console. And it hangs in
> these cases:
> 1/ kernel hang when calling early write function after free_initmem:
> a) the earlycon not disabled after the init code(due to keep_bootcon or
>not specify a real console to switch to)
> b) the early write func is marked as __init, for example 8250_early.
>
> 2/ kernel hang when calling early write function after disable unused
> clks/pm domain:
> a) the earlycon not disabled after the init code
> b) the disable unused clks/pm domain kill the requiered clks/pm
>domain, since they are not referenced by the earlycon.
>
> 3/ kernel hang when calling early write function after the serial
>console driver runtime suspended:
> a) the earlycon not disabled after the init code
> b) the serial console driver's runtime suspend kills the requiered
>clks/pm domain, since they are not referenced by the earlycon.
>
> This serial fix 1/ case only.
>
>
>
> Jeffy Chen (5):
>   serial: arc: Remove __init marking from early write
>   serial: omap: Remove __init marking from early write
>   serial: xuartps: Remove __init marking from early write
>   serial: 8250_ingenic: Remove __init marking from early write
>   serial: 8250_early: Remove __init marking from early write
>
>  drivers/tty/serial/8250/8250_early.c   |  8 
>  drivers/tty/serial/8250/8250_ingenic.c |  8 
>  drivers/tty/serial/arc_uart.c  |  4 ++--
>  drivers/tty/serial/omap-serial.c   | 13 ++---
>  drivers/tty/serial/xilinx_uartps.c |  2 +-
>  5 files changed, 17 insertions(+), 18 deletions(-)

I looked through all 5 patches and they seem sane to me.  I didn't go
through and test them since Brian already tested the patches for the
one UART driver I'd have access to anyway.

As per my previous email, I don't see any need to solve all the
world'd problems with this patch series, plus the keep_bootcon seems
to be an experts / debugging type option and it seems sane if you need
to take care in using it.

So officially for the series (FWIW since I'm just an interested 3rd party):

Reviewed-by: Douglas Anderson 


-Doug

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH 0/5] earlycon hang under some conditions

2017-07-18 Thread Doug Anderson
Hi,

On Tue, Jul 18, 2017 at 10:51 AM, Brian Norris  wrote:
> Hi Jeffy,
>
> On Tue, Jul 18, 2017 at 02:02:53PM +0800, Jeffy Chen wrote:
>> I was testing earlycon with 8250 dw serial console. And it hangs in
>> these cases:
>> 1/ kernel hang when calling early write function after free_initmem:
>> a) the earlycon not disabled after the init code(due to keep_bootcon or
>>not specify a real console to switch to)
>> b) the early write func is marked as __init, for example 8250_early.
>
> FWIW, I tested 8250_early with "keep_bootcon", and this fixes that:
>
> Tested-by: Brian Norris 
>
> But then I get double console, if I have both a "real" and "early"
> console.
>
> If I were to *only* have the early console, then I might hit the
> problems you mention:
>
>> 2/ kernel hang when calling early write function after disable unused
>> clks/pm domain:
>> a) the earlycon not disabled after the init code
>> b) the disable unused clks/pm domain kill the requiered clks/pm
>>domain, since they are not referenced by the earlycon.
>>
>> 3/ kernel hang when calling early write function after the serial
>>console driver runtime suspended:
>> a) the earlycon not disabled after the init code
>> b) the serial console driver's runtime suspend kills the requiered
>>clks/pm domain, since they are not referenced by the earlycon.
>>
>> This serial fix 1/ case only.
>
> Problems 2 and 3 look like something that's not really in scope for
> early consoles. There's a reason they are mainly supported for early
> boot, and we try to switch off of them. There isn't really a good way to
> handle all the clock and PM infrastructure without...switching off the
> earlycon and using the real one :)
>
> So, I guess this patchset has value for systems where the clock/PM
> requirements are simple enough, and the earlycon can actually be useful
> beyond early init.

Presumably someone using this would just use the "clk_ignore_unused"
and "pd_ignore_unused" kernel parameters to help them...  In this case
that seems fine to me.  The "early" boot console is early because it
can't rely on all of the generic OS mechanisms to do things super
cleanly, so requiring the user to know that they should add the extra
command line arguments seems sane.

-Doug

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc