On Wed, Nov 05, 2014 at 02:52:36AM +0000, Zheng, Lv wrote: > Hi, Rafael > > There is one thing I should let you know. > > Originally this patchset is dependent on the GPE "dead lock" fix. > Because this patch will invoke acpi_enable_gpe()/acpi_disable_gpe() with EC > lock held. > > I saw system hang during suspending using only this patchset, so we have to > find a solution. > > > From: Zheng, Lv > > Sent: Monday, November 03, 2014 1:16 PM > > > > By using the 2 flags, we can indicate an inter-mediate state where the > > current transactions should be completed while the new transactions should > > be dropped. > > > > The comparison of the old flag and the new flags: > > Old New > > about to set BLOCKED STOPPED set / STARTED set > > BLOCKED set STOPPED clear / STARTED clear > > BLOCKED clear STOPPED clear / STARTED set > > The new period is between the point where we are about to set BLOCKED and > > the point when the BLOCKED is set. The GPE is disabled during this period. > > The new flags allow us to add acpi_ec_stopped() check to only check with > > STOPPED flag to implement transaction flushing. This is not done in this > > patch. > > > > No functional changes except that after applying this patch, the GPE > > enabling/disabling is protected by the EC specific lock. We can do this > > because of recent ACPICA GPE API enhancement. This is reasonable as the GPE > > disabling/enabling state should only be determined by the EC driver's state > > machine which is protected by the EC spinlock. > > This paragraph is talking about the dependency. > > > > > Signed-off-by: Lv Zheng <lv.zh...@intel.com> > > Tested-by: Ortwin Glück <o...@odi.ch> > > --- > > drivers/acpi/ec.c | 56 > > +++++++++++++++++++++++++++++++++++++++++++++-------- > > 1 file changed, 48 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c > > index 5f9b74b..192cd11 100644 > > --- a/drivers/acpi/ec.c > > +++ b/drivers/acpi/ec.c > > @@ -79,7 +79,8 @@ enum { > > EC_FLAGS_GPE_STORM, /* GPE storm detected */ > > EC_FLAGS_HANDLERS_INSTALLED, /* Handlers for GPE and > > * OpReg are installed */ > > - EC_FLAGS_BLOCKED, /* Transactions are blocked */ > > + EC_FLAGS_STARTED, /* Driver is started */ > > + EC_FLAGS_STOPPED, /* Driver is stopped */ > > }; > > > > #define ACPI_EC_COMMAND_POLL 0x01 /* Available for command > > byte */ > > @@ -129,6 +130,16 @@ static int EC_FLAGS_CLEAR_ON_RESUME; /* Needs > > acpi_ec_clear() on boot/resume */ > > static int EC_FLAGS_QUERY_HANDSHAKE; /* Needs QR_EC issued when SCI_EVT > > set */ > > > > /* > > -------------------------------------------------------------------------- > > + * Device Flags > > + * > > -------------------------------------------------------------------------- > > */ > > + > > +static bool acpi_ec_started(struct acpi_ec *ec) > > +{ > > + return test_bit(EC_FLAGS_STARTED, &ec->flags) && > > + !test_bit(EC_FLAGS_STOPPED, &ec->flags); > > +} > > + > > +/* > > -------------------------------------------------------------------------- > > * Transaction Management > > * > > -------------------------------------------------------------------------- > > */ > > > > @@ -354,7 +365,7 @@ static int acpi_ec_transaction(struct acpi_ec *ec, > > struct transaction *t) > > if (t->rdata) > > memset(t->rdata, 0, t->rlen); > > mutex_lock(&ec->mutex); > > - if (test_bit(EC_FLAGS_BLOCKED, &ec->flags)) { > > + if (!acpi_ec_started(ec)) { > > status = -EINVAL; > > goto unlock; > > } > > @@ -511,6 +522,35 @@ static void acpi_ec_clear(struct acpi_ec *ec) > > pr_info("%d stale EC events cleared\n", i); > > } > > > > +static void acpi_ec_start(struct acpi_ec *ec) > > +{ > > + unsigned long flags; > > + > > + spin_lock_irqsave(&ec->lock, flags); > > + if (!test_and_set_bit(EC_FLAGS_STARTED, &ec->flags)) { > > + pr_debug("+++++ Starting EC +++++\n"); > > + acpi_enable_gpe(NULL, ec->gpe); > > This can work without "GPE dead lock" fix applied because: > 1. During boot, this API is called when the EC GPE is disabled. > 2. During resume, this API is called when the EC GPE is disabled (because EC > GPE is always not wake capable). > > > + pr_info("+++++ EC started +++++\n"); > > + } > > + spin_unlock_irqrestore(&ec->lock, flags); > > +} > > + > > +static void acpi_ec_stop(struct acpi_ec *ec) > > +{ > > + unsigned long flags; > > + > > + spin_lock_irqsave(&ec->lock, flags); > > + if (acpi_ec_started(ec)) { > > + pr_debug("+++++ Stopping EC +++++\n"); > > + set_bit(EC_FLAGS_STOPPED, &ec->flags); > > + acpi_disable_gpe(NULL, ec->gpe); > > But this cannot work without "GPE dead lock" fix applied because: > > In acpi_pm_freeze(), the call graph would be: > acpi_pm_freeze() > acpi_disable_all_gpes() > acpi_os_wait_events_complete() > acpi_ec_block_transactions() > acpi_ec_stop() > hold EC lock > acpi_disable_gpe() > hold GPE lock > > And in the GPE handler acpi_irq(), the call graph would be: > acpi_irq() > acpi_ev_sci_xrupt_handler() > acpi_ev_gpe_detect() > hold GPE lock > acpi_ev_gpe_dispatch() > acpi_ec_gpe_handler() > hold EC lock > > Since acpi_os_wait_events_complete() cannot flush GPE but can only flush > _Lxx/_Exx evaluation work queue currently. > The reversed ordered dead lock can happen. > We need to fix the acpi_os_wait_events_complete() prior than this series. > I have a fix to invoke synchronize_irq() in acpi_os_wait_events_complete(). > Let me send it to you. > This cleanup should be applied after that fix. >
Here's lockdep warning I see on -next: [ 0.510159] ====================================================== [ 0.510171] [ INFO: possible circular locking dependency detected ] [ 0.510185] 3.18.0-rc4-next-20141117-07404-g9dad2ab6df8b #66 Not tainted [ 0.510197] ------------------------------------------------------- [ 0.510209] swapper/3/0 is trying to acquire lock: [ 0.510219] (&(&ec->lock)->rlock){-.....}, at: [<ffffffff814d533e>] acpi_ec_gpe_handler+0x21/0xfc [ 0.510254] [ 0.510254] but task is already holding lock: [ 0.510266] (&(*(&acpi_gbl_gpe_lock))->rlock){-.....}, at: [<ffffffff814cd67e>] acpi_os_acquire_lock+0xe/0x10 [ 0.510296] [ 0.510296] which lock already depends on the new lock. [ 0.510296] [ 0.510312] [ 0.510312] the existing dependency chain (in reverse order) is: [ 0.510327] [ 0.510327] -> #1 (&(*(&acpi_gbl_gpe_lock))->rlock){-.....}: [ 0.510344] [<ffffffff81158f4f>] lock_acquire+0xdf/0x2d0 [ 0.510364] [<ffffffff81b08010>] _raw_spin_lock_irqsave+0x50/0x70 [ 0.510381] [<ffffffff814cd67e>] acpi_os_acquire_lock+0xe/0x10 [ 0.510398] [<ffffffff814e31e8>] acpi_enable_gpe+0x22/0x68 [ 0.510416] [<ffffffff814d5b24>] acpi_ec_start+0x66/0x87 [ 0.510432] [<ffffffff81afc771>] ec_install_handlers+0x41/0xa4 [ 0.510449] [<ffffffff823e72b9>] acpi_ec_ecdt_probe+0x1a9/0x1ea [ 0.510466] [<ffffffff823e6ae3>] acpi_init+0x8b/0x26e [ 0.510480] [<ffffffff81002148>] do_one_initcall+0xd8/0x210 [ 0.510496] [<ffffffff8239f1dc>] kernel_init_freeable+0x1f5/0x282 [ 0.510513] [<ffffffff81af1a1e>] kernel_init+0xe/0xf0 [ 0.510527] [<ffffffff81b08cfc>] ret_from_fork+0x7c/0xb0 [ 0.510542] [ 0.510542] -> #0 (&(&ec->lock)->rlock){-.....}: [ 0.510558] [<ffffffff811585ef>] __lock_acquire+0x210f/0x2220 [ 0.510574] [<ffffffff81158f4f>] lock_acquire+0xdf/0x2d0 [ 0.510589] [<ffffffff81b08010>] _raw_spin_lock_irqsave+0x50/0x70 [ 0.510604] [<ffffffff814d533e>] acpi_ec_gpe_handler+0x21/0xfc [ 0.510620] [<ffffffff814e02c2>] acpi_ev_gpe_dispatch+0xd2/0x143 [ 0.510636] [<ffffffff814e03fb>] acpi_ev_gpe_detect+0xc8/0x10f [ 0.510652] [<ffffffff814e23b6>] acpi_ev_sci_xrupt_handler+0x22/0x38 [ 0.510669] [<ffffffff814cc8ee>] acpi_irq+0x16/0x31 [ 0.510684] [<ffffffff8116eccf>] handle_irq_event_percpu+0x6f/0x540 [ 0.510702] [<ffffffff8116f1e1>] handle_irq_event+0x41/0x70 [ 0.510718] [<ffffffff81171ef6>] handle_fasteoi_irq+0x86/0x140 [ 0.510733] [<ffffffff81075a22>] handle_irq+0x22/0x40 [ 0.510748] [<ffffffff81b0beaf>] do_IRQ+0x4f/0xf0 [ 0.510762] [<ffffffff81b09bb2>] ret_from_intr+0x0/0x1a [ 0.510777] [<ffffffff8107e783>] default_idle+0x23/0x260 [ 0.510792] [<ffffffff8107f35f>] arch_cpu_idle+0xf/0x20 [ 0.510806] [<ffffffff8114a99b>] cpu_startup_entry+0x36b/0x5b0 [ 0.510821] [<ffffffff810a8d04>] start_secondary+0x1a4/0x1d0 [ 0.510840] [ 0.510840] other info that might help us debug this: [ 0.510840] [ 0.510856] Possible unsafe locking scenario: [ 0.510856] [ 0.510868] CPU0 CPU1 [ 0.510877] ---- ---- [ 0.510886] lock(&(*(&acpi_gbl_gpe_lock))->rlock); [ 0.510898] lock(&(&ec->lock)->rlock); [ 0.510912] lock(&(*(&acpi_gbl_gpe_lock))->rlock); [ 0.510927] lock(&(&ec->lock)->rlock); [ 0.510938] [ 0.510938] *** DEADLOCK *** [ 0.510938] [ 0.510953] 1 lock held by swapper/3/0: [ 0.510961] #0: (&(*(&acpi_gbl_gpe_lock))->rlock){-.....}, at: [<ffffffff814cd67e>] acpi_os_acquire_lock+0xe/0x10 [ 0.510990] [ 0.510990] stack backtrace: [ 0.511004] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.18.0-rc4-next-20141117-07404-g9dad2ab6df8b #66 [ 0.511021] Hardware name: LENOVO 3460CC6/3460CC6, BIOS G6ET93WW (2.53 ) 02/04/2013 [ 0.511035] ffffffff82cb2f70 ffff88011e2c3bb8 ffffffff81afc316 0000000000000011 [ 0.511055] ffffffff82cb2f70 ffff88011e2c3c08 ffffffff81afae11 0000000000000001 [ 0.511074] ffff88011e2c3c68 ffff88011e2c3c08 ffff8801193f92d0 ffff8801193f9b20 [ 0.511094] Call Trace: [ 0.511101] <IRQ> [<ffffffff81afc316>] dump_stack+0x4c/0x6e [ 0.511125] [<ffffffff81afae11>] print_circular_bug+0x2b2/0x2c3 [ 0.511142] [<ffffffff811585ef>] __lock_acquire+0x210f/0x2220 [ 0.511159] [<ffffffff81158f4f>] lock_acquire+0xdf/0x2d0 [ 0.511176] [<ffffffff814d533e>] ? acpi_ec_gpe_handler+0x21/0xfc [ 0.511192] [<ffffffff81b08010>] _raw_spin_lock_irqsave+0x50/0x70 [ 0.511209] [<ffffffff814d533e>] ? acpi_ec_gpe_handler+0x21/0xfc [ 0.511225] [<ffffffff814ea192>] ? acpi_hw_write+0x4b/0x52 [ 0.511241] [<ffffffff814d533e>] acpi_ec_gpe_handler+0x21/0xfc [ 0.511258] [<ffffffff814e02c2>] acpi_ev_gpe_dispatch+0xd2/0x143 [ 0.511274] [<ffffffff814e03fb>] acpi_ev_gpe_detect+0xc8/0x10f [ 0.511292] [<ffffffff814e23b6>] acpi_ev_sci_xrupt_handler+0x22/0x38 [ 0.511309] [<ffffffff814cc8ee>] acpi_irq+0x16/0x31 [ 0.511325] [<ffffffff8116eccf>] handle_irq_event_percpu+0x6f/0x540 [ 0.511342] [<ffffffff8116f1e1>] handle_irq_event+0x41/0x70 [ 0.511357] [<ffffffff81171e98>] ? handle_fasteoi_irq+0x28/0x140 [ 0.511372] [<ffffffff81171ef6>] handle_fasteoi_irq+0x86/0x140 [ 0.511388] [<ffffffff81075a22>] handle_irq+0x22/0x40 [ 0.511402] [<ffffffff81b0beaf>] do_IRQ+0x4f/0xf0 [ 0.511417] [<ffffffff81b09bb2>] common_interrupt+0x72/0x72 [ 0.511428] <EOI> [<ffffffff810b8986>] ? native_safe_halt+0x6/0x10 [ 0.511454] [<ffffffff81154f3d>] ? trace_hardirqs_on+0xd/0x10 [ 0.511468] [<ffffffff8107e783>] default_idle+0x23/0x260 [ 0.511482] [<ffffffff8107f35f>] arch_cpu_idle+0xf/0x20 [ 0.511496] [<ffffffff8114a99b>] cpu_startup_entry+0x36b/0x5b0 [ 0.511512] [<ffffffff810a8d04>] start_secondary+0x1a4/0x1d0 -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/