At Tue, 12 May 2015 18:18:13 -0700,
Laura Abbott wrote:
>
> (cc-ing linux-usb as well)
>
> On 05/12/2015 08:14 AM, Marcel Holtmann wrote:
> > Hi Laura,
> >
> >>>> We've received a number of reports of warnings when coming
> >>>> out of suspend with certain bluetooth firmware configurations:
> >>>>
> >>>> WARNING: CPU: 3 PID: 3280 at drivers/base/firmware_class.c:1126
> >>>> _request_firmware+0x558/0x810()
> >>>> Modules linked in: ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
> >>>> xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter
> >>>> ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
> >>>> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> >>>> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> >>>> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw
> >>>> binfmt_misc bnep intel_rapl iosf_mbi arc4 x86_pkg_temp_thermal
> >>>> snd_hda_codec_hdmi coretemp kvm_intel joydev snd_hda_codec_realtek
> >>>> iwldvm snd_hda_codec_generic kvm iTCO_wdt mac80211 iTCO_vendor_support
> >>>> snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul
> >>>> snd_hwdep crc32_pclmul snd_seq crc32c_intel ghash_clmulni_intel uvcvideo
> >>>> snd_seq_device iwlwifi btusb videobuf2_vmalloc snd_pcm videobuf2_core
> >>>> serio_raw bluetooth cfg80211 videobuf2_memops sdhci_pci v4l2_common
> >>>> videodev thinkpad_acpi sdhci i2c_i801 lpc_ich mfd_core wacom mmc_core
> >>>> media snd_timer tpm_tis hid_logitech_hidpp wmi tpm rfkill snd mei_me mei
> >>>> shpchp soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915
> >>>> i2c_algo_bit drm_kms_helper e1000e drm hid_logitech_dj ptp pps_core
> >>>> video
> >>>> CPU: 3 PID: 3280 Comm: kworker/u17:0 Not tainted 3.19.3-200.fc21.x86_64
> >>>> Hardware name: LENOVO 343522U/343522U, BIOS GCET96WW (2.56 ) 10/22/2013
> >>>> Workqueue: hci0 hci_power_on [bluetooth]
> >>>> 0000000000000000 0000000089944328 ffff88040acffb78 ffffffff8176e215
> >>>> 0000000000000000 0000000000000000 ffff88040acffbb8 ffffffff8109bc1a
> >>>> 0000000000000000 ffff88040acffcd0 00000000fffffff5 ffff8804076bac40
> >>>> Call Trace:
> >>>> [<ffffffff8176e215>] dump_stack+0x45/0x57
> >>>> [<ffffffff8109bc1a>] warn_slowpath_common+0x8a/0xc0
> >>>> [<ffffffff8109bd4a>] warn_slowpath_null+0x1a/0x20
> >>>> [<ffffffff814dbe78>] _request_firmware+0x558/0x810
> >>>> [<ffffffff814dc165>] request_firmware+0x35/0x50
> >>>> [<ffffffffa03a7886>] btusb_setup_bcm_patchram+0x86/0x590 [btusb]
> >>>> [<ffffffff814d40e6>] ? rpm_idle+0xd6/0x230
> >>>> [<ffffffffa04d4801>] hci_dev_do_open+0xe1/0xa90 [bluetooth]
> >>>> [<ffffffff810c51dd>] ? ttwu_do_activate.constprop.90+0x5d/0x70
> >>>> [<ffffffffa04d5980>] hci_power_on+0x40/0x200 [bluetooth]
> >>>> [<ffffffff810b487c>] process_one_work+0x14c/0x3f0
> >>>> [<ffffffff810b52f3>] worker_thread+0x53/0x470
> >>>> [<ffffffff810b52a0>] ? rescuer_thread+0x300/0x300
> >>>> [<ffffffff810ba548>] kthread+0xd8/0xf0
> >>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
> >>>> [<ffffffff81774958>] ret_from_fork+0x58/0x90
> >>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
> >>>>
> >>>> This occurs after every resume.
> >>>>
> >>>> When resuming, the bluetooth stack calls hci_register_dev,
> >>>> allocates a new workqueue, and immediately schedules the
> >>>> power_on on the newly created workqueue. Since the new
> >>>> workqueue is not freezable, the work runs immediately and
> >>>> triggers the warning since resume is still happening and
> >>>> usermodehelper has not yet been re-enabled. Fix this by
> >>>> making the request workqueue freezable. This ensures
> >>>> the work will not run until unfreezing occurs and usermodehelper
> >>>> is re-enabled.
> >>>>
> >>>> Signed-off-by: Laura Abbott <[email protected]>
> >>>> ---
> >>>> Resend because I think this got lost in the thread.
> >>>> This should be fixing the actual root cause of the warnings.
> >>>
> >>> so I am not convinced that it actually fixes the root cause. This is just
> >>> papering over it.
> >>>
> >>> The problem is pretty clear, the firmware for some of the Bluetooth
> >>> controllers is optional and that means during the original hotplug event
> >>> it will not be found and the controller keeps operating. However for some
> >>> reason instead of actually suspending and resuming the Bluetooth
> >>> controller, we see a unplug + replug (since we are going through probe)
> >>> and that is causing this funny behaviour.
> >>>
> >>
> >> Fundamentally the issue is the request_firmware is being called at the
> >> wrong time. From Documentation/workqueue.txt:
> >>
> >> WQ_FREEZABLE
> >>
> >> A freezable wq participates in the freeze phase of the system
> >> suspend operations. Work items on the wq are drained and no
> >> new work item starts execution until thawed.
> >>
> >>
> >> By making the request workqueue freezable, any work that gets scheduled
> >> will not run until the time for tasks to unthaw.
> >> 4320f6b1d9db4ca912c5eb6ecb328b2e090e1586
> >> ("PM / sleep: Fix request_firmware() error at resume") fixed the resume
> >> path such that before all tasks are unthawed, calls to
> >> usermodehelper_read_trylock will block until usermodehelper is fully
> >> resumed. This means that any task which is frozen and then woken up
> >> again should have the right sequencing for usermodehelper. The workqueue
> >> which handled the bluetooth power on was never being frozen properly so
> >> there was never any guarantee of when it would run. This patch gives
> >> it the necessary sequence.
> >>
> >>> So how does making one of the core workqueues freezable fixes this the
> >>> right way. I do not even know how many other side effects that might
> >>> have. That hdev->req_workqueue is a Bluetooth core internal workqueue
> >>> that we are using for multiple tasks.
> >>>
> >>> Rather tell me on why we are probing the USB devices that might need
> >>> firmware without having userspace ready. It sounds to me that the USB
> >>> driver probe callback should be delayed if we can not guarantee that it
> >>> can request firmware. As I explained many times, the call path that
> >>> causes this is going through probe callback of the driver itself.
> >>>
> >>
> >> I agree that if the driver probe function was requesting firmware
> >> directly there would be a problem. The power_on function is already
> >> being called asynchronously on a workqueue. Making that workqueue
> >> freezable does exactly the delay you describe.
> >
> > I am not convinced. Now we are hacking the Bluetooth core layer (which has
> > nothing to do with the drivers suspend/resume or probe) to do something
> > different so that we do not see this warning.
> >
> > I can not do anything about the platform in question choosing a
> > unplug/replug for suspend/resume instead of having a proper USB suspend and
> > resume handling. That is pretty much out of our control. I would rather
> > have the USB subsystem delay the probe() callback if we tell it to. Of just
> > have request_firmware() actually sleep until userspace is ready. Seriously,
> > why is request_firmware not just sleeping for us.
> >
>
> The closest thing to blocking is usermodehelper_read_lock_wait which
> waits for a limited amount of time. Takashi Iwai proposed switching
> to that unconditionally for all request_firmware but I never saw a
> response from the firmware maintainers. I suspect that may not be
> acceptable because if the firmware actually needs to block it should
> be an asynchronous call. The firmware maintainers can correct me
> if I'm incorrect in my understanding.
IIRC, the reason of using usermodehelper_read_trylock() for the normal
request_firmware() (not the nowait one) is to check the call of
request_firmware() in the resume callback. If the firmware hasn't
been cached, it should fail.
So, using _trylock() there isn't wrong, per se. It's indeed safer.
But, the problem is that _trylock() is used unconditionally for all
request_firmware() calls even if it's never from the resume path.
Maybe we should allow the f/w loader caller to specify whether to use
UMH trylock or wait. The patch below exposes _request_firmware() and
FW_OPT_ flags. Then BT driver can call like
_request_firmware(&fw, name, dev,
FW_OPT_UEVENT | FW_OPT_NO_WARN | FW_OPT_UMH_LOCK_WAIT)
Note that the patch is totally untested!
Or doesn't this look intuitive enough?
Takashi
---
diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
index 171841ad1008..47eb5551c119 100644
--- a/drivers/base/firmware_class.c
+++ b/drivers/base/firmware_class.c
@@ -97,21 +97,6 @@ static inline long firmware_loading_timeout(void)
return loading_timeout > 0 ? loading_timeout * HZ : MAX_JIFFY_OFFSET;
}
-/* firmware behavior options */
-#define FW_OPT_UEVENT (1U << 0)
-#define FW_OPT_NOWAIT (1U << 1)
-#ifdef CONFIG_FW_LOADER_USER_HELPER
-#define FW_OPT_USERHELPER (1U << 2)
-#else
-#define FW_OPT_USERHELPER 0
-#endif
-#ifdef CONFIG_FW_LOADER_USER_HELPER_FALLBACK
-#define FW_OPT_FALLBACK FW_OPT_USERHELPER
-#else
-#define FW_OPT_FALLBACK 0
-#endif
-#define FW_OPT_NO_WARN (1U << 3)
-
struct firmware_cache {
/* firmware_buf instance will be added into the below list */
spinlock_t lock;
@@ -1085,7 +1070,7 @@ static int assign_firmware_buf(struct firmware *fw,
struct device *device,
}
/* called from request_firmware() and request_firmware_work_func() */
-static int
+int
_request_firmware(const struct firmware **firmware_p, const char *name,
struct device *device, unsigned int opt_flags)
{
@@ -1099,13 +1084,16 @@ _request_firmware(const struct firmware **firmware_p,
const char *name,
if (!name || name[0] == '\0')
return -EINVAL;
+ /* Need to pin this module until return */
+ __module_get(THIS_MODULE);
+
ret = _request_firmware_prepare(&fw, name, device);
if (ret <= 0) /* error or already assigned */
goto out;
ret = 0;
timeout = firmware_loading_timeout();
- if (opt_flags & FW_OPT_NOWAIT) {
+ if (opt_flags & FW_OPT_UMH_LOCK_WAIT) {
timeout = usermodehelper_read_lock_wait(timeout);
if (!timeout) {
dev_dbg(device, "firmware: %s loading timed out\n",
@@ -1147,67 +1135,10 @@ _request_firmware(const struct firmware **firmware_p,
const char *name,
}
*firmware_p = fw;
- return ret;
-}
-
-/**
- * request_firmware: - send firmware request and wait for it
- * @firmware_p: pointer to firmware image
- * @name: name of firmware file
- * @device: device for which firmware is being loaded
- *
- * @firmware_p will be used to return a firmware image by the name
- * of @name for device @device.
- *
- * Should be called from user context where sleeping is allowed.
- *
- * @name will be used as $FIRMWARE in the uevent environment and
- * should be distinctive enough not to be confused with any other
- * firmware image for this or any other device.
- *
- * Caller must hold the reference count of @device.
- *
- * The function can be called safely inside device's suspend and
- * resume callback.
- **/
-int
-request_firmware(const struct firmware **firmware_p, const char *name,
- struct device *device)
-{
- int ret;
-
- /* Need to pin this module until return */
- __module_get(THIS_MODULE);
- ret = _request_firmware(firmware_p, name, device,
- FW_OPT_UEVENT | FW_OPT_FALLBACK);
- module_put(THIS_MODULE);
- return ret;
-}
-EXPORT_SYMBOL(request_firmware);
-
-/**
- * request_firmware_direct: - load firmware directly without usermode helper
- * @firmware_p: pointer to firmware image
- * @name: name of firmware file
- * @device: device for which firmware is being loaded
- *
- * This function works pretty much like request_firmware(), but this doesn't
- * fall back to usermode helper even if the firmware couldn't be loaded
- * directly from fs. Hence it's useful for loading optional firmwares, which
- * aren't always present, without extra long timeouts of udev.
- **/
-int request_firmware_direct(const struct firmware **firmware_p,
- const char *name, struct device *device)
-{
- int ret;
-
- __module_get(THIS_MODULE);
- ret = _request_firmware(firmware_p, name, device,
- FW_OPT_UEVENT | FW_OPT_NO_WARN);
module_put(THIS_MODULE);
return ret;
}
-EXPORT_SYMBOL_GPL(request_firmware_direct);
+EXPORT_SYMBOL_GPL(_request_firmware);
/**
* release_firmware: - release the resource associated with a firmware image
@@ -1291,6 +1222,7 @@ request_firmware_nowait(
fw_work->context = context;
fw_work->cont = cont;
fw_work->opt_flags = FW_OPT_NOWAIT | FW_OPT_FALLBACK |
+ FW_OPT_UMH_LOCK_WAIT |
(uevent ? FW_OPT_UEVENT : FW_OPT_USERHELPER);
if (!try_module_get(module)) {
diff --git a/include/linux/firmware.h b/include/linux/firmware.h
index 5c41c5e75b5c..460aa30965cf 100644
--- a/include/linux/firmware.h
+++ b/include/linux/firmware.h
@@ -39,23 +39,21 @@ struct builtin_fw {
__used __section(.builtin_fw) = { name, blob, size }
#if defined(CONFIG_FW_LOADER) || (defined(CONFIG_FW_LOADER_MODULE) &&
defined(MODULE))
-int request_firmware(const struct firmware **fw, const char *name,
- struct device *device);
+int _request_firmware(const struct firmware **firmware_p, const char *name,
+ struct device *device, unsigned int opt_flags);
int request_firmware_nowait(
struct module *module, bool uevent,
const char *name, struct device *device, gfp_t gfp, void *context,
void (*cont)(const struct firmware *fw, void *context));
-int request_firmware_direct(const struct firmware **fw, const char *name,
- struct device *device);
-
void release_firmware(const struct firmware *fw);
#else
-static inline int request_firmware(const struct firmware **fw,
- const char *name,
- struct device *device)
+static inline int
+_request_firmware(const struct firmware **firmware_p, const char *name,
+ struct device *device, unsigned int opt_flags)
{
return -EINVAL;
}
+
static inline int request_firmware_nowait(
struct module *module, bool uevent,
const char *name, struct device *device, gfp_t gfp, void *context,
@@ -67,13 +65,69 @@ static inline int request_firmware_nowait(
static inline void release_firmware(const struct firmware *fw)
{
}
+#endif
-static inline int request_firmware_direct(const struct firmware **fw,
- const char *name,
- struct device *device)
+/* firmware behavior options */
+#define FW_OPT_UEVENT (1U << 0)
+#define FW_OPT_NOWAIT (1U << 1)
+#ifdef CONFIG_FW_LOADER_USER_HELPER
+#define FW_OPT_USERHELPER (1U << 2)
+#else
+#define FW_OPT_USERHELPER 0
+#endif
+#ifdef CONFIG_FW_LOADER_USER_HELPER_FALLBACK
+#define FW_OPT_FALLBACK FW_OPT_USERHELPER
+#else
+#define FW_OPT_FALLBACK 0
+#endif
+#define FW_OPT_NO_WARN (1U << 3)
+#define FW_OPT_UMH_LOCK_WAIT (1U << 4)
+
+/**
+ * request_firmware: - send firmware request and wait for it
+ * @firmware_p: pointer to firmware image
+ * @name: name of firmware file
+ * @device: device for which firmware is being loaded
+ *
+ * @firmware_p will be used to return a firmware image by the name
+ * of @name for device @device.
+ *
+ * Should be called from user context where sleeping is allowed.
+ *
+ * @name will be used as $FIRMWARE in the uevent environment and
+ * should be distinctive enough not to be confused with any other
+ * firmware image for this or any other device.
+ *
+ * Caller must hold the reference count of @device.
+ *
+ * The function can be called safely inside device's suspend and
+ * resume callback.
+ **/
+static inline int
+request_firmware(const struct firmware **firmware_p, const char *name,
+ struct device *device)
{
- return -EINVAL;
+ return _request_firmware(firmware_p, name, device,
+ FW_OPT_UEVENT | FW_OPT_FALLBACK);
+}
+
+/**
+ * request_firmware_direct: - load firmware directly without usermode helper
+ * @firmware_p: pointer to firmware image
+ * @name: name of firmware file
+ * @device: device for which firmware is being loaded
+ *
+ * This function works pretty much like request_firmware(), but this doesn't
+ * fall back to usermode helper even if the firmware couldn't be loaded
+ * directly from fs. Hence it's useful for loading optional firmwares, which
+ * aren't always present, without extra long timeouts of udev.
+ **/
+static inline int
+request_firmware_direct(const struct firmware **firmware_p,
+ const char *name, struct device *device)
+{
+ return _request_firmware(firmware_p, name, device,
+ FW_OPT_UEVENT | FW_OPT_NO_WARN);
}
-#endif
#endif
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/