Re: [PATCH 00/23] ALUA device handler update, part II
On 02/08/2016 06:34 AM, Hannes Reinecke wrote: > as promised here is now the second part of my ALUA device handler update. Hello Hannes, Please test this patch series with lockdep enabled and fix the resulting complaints. This is what was reported on my test setup shortly after multipathd was started: = [ BUG: bad unlock balance detected! ] 4.5.0-rc3+ #6 Tainted: GE - kworker/3:1/141 is trying to release lock (port_group_lock) at: [] alua_rtpg+0x329/0x890 [scsi_dh_alua] but there are no more locks to release! other info that might help us debug this: 2 locks held by kworker/3:1/141: #0: ("kaluad"){.+.+.+}, at: [] process_one_work+0x16a/0x480 #1: ((&(&pg->rtpg_work)->work)){+.+.+.}, at: [] process_one_work+0x16a/0x480 stack backtrace: CPU: 3 PID: 141 Comm: kworker/3:1 Tainted: GE 4.5.0-rc3+ #6 Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014 Workqueue: kaluad alua_rtpg_work [scsi_dh_alua] 880456807978 81263ba7 0007 0006 880456ff9f80 a041be19 8804568079a8 810a29a9 8803f61b7bb8 880456ff9f80 a041d878 Call Trace: [] dump_stack+0x6b/0xa4 [] ? alua_rtpg+0x329/0x890 [scsi_dh_alua] [] print_unlock_imbalance_bug+0xf9/0x100 [] ? alua_rtpg+0x329/0x890 [scsi_dh_alua] [] __lock_release+0x25f/0x3a0 [] ? __lock_release+0xc4/0x3a0 [] ? alua_rtpg+0x329/0x890 [scsi_dh_alua] [] lock_release+0x39/0x60 [] _raw_spin_unlock_irqrestore+0x29/0x60 [] alua_rtpg+0x329/0x890 [scsi_dh_alua] [] ? alua_rtpg+0x3d5/0x890 [scsi_dh_alua] [] ? __lock_release+0xc4/0x3a0 [] ? check_usage_forwards+0x100/0x100 [] ? mark_held_locks+0x71/0x90 [] ? _raw_spin_unlock_irqrestore+0x3b/0x60 [] ? trace_hardirqs_on_caller+0xfc/0x1c0 [] alua_rtpg_work+0x1be/0x370 [scsi_dh_alua] [] process_one_work+0x1da/0x480 [] ? process_one_work+0x16a/0x480 [] ? __lock_release+0xc4/0x3a0 [] worker_thread+0x169/0x520 [] ? complete+0x48/0x60 [] ? _raw_spin_unlock_irqrestore+0x3b/0x60 [] ? maybe_create_worker+0x110/0x110 [] ? maybe_create_worker+0x110/0x110 [] ? schedule+0x42/0xb0 [] ? maybe_create_worker+0x110/0x110 [] kthread+0xe4/0x100 [] ? trace_hardirqs_on+0xd/0x10 [] ? schedule_tail+0x19/0xd0 [] ? __init_kthread_worker+0x70/0x70 [] ret_from_fork+0x3f/0x70 [] ? __init_kthread_worker+0x70/0x70 sd 13:0:0:1: alua: port group 101 state A preferred supports tOlUSNA BUG: workqueue leaked lock or atomic: kworker/3:1/0x7ffe/141 last function: alua_rtpg_work [scsi_dh_alua] INFO: lockdep is turned off. CPU: 3 PID: 141 Comm: kworker/3:1 Tainted: GE 4.5.0-rc3+ #6 Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014 Workqueue: kaluad alua_rtpg_work [scsi_dh_alua] 880456807c38 81263ba7 0001 880457355960 880456ff9f80 880456807d28 81072472 8107225a 810a6714 e8fffec71f05 Call Trace: [] dump_stack+0x6b/0xa4 [] process_one_work+0x382/0x480 [] ? process_one_work+0x16a/0x480 [] ? __lock_release+0xc4/0x3a0 [] worker_thread+0x169/0x520 [] ? complete+0x48/0x60 [] ? _raw_spin_unlock_irqrestore+0x3b/0x60 [] ? maybe_create_worker+0x110/0x110 [] ? maybe_create_worker+0x110/0x110 [] ? schedule+0x42/0xb0 [] ? maybe_create_worker+0x110/0x110 [] kthread+0xe4/0x100 [] ? trace_hardirqs_on+0xd/0x10 [] ? schedule_tail+0x19/0xd0 [] ? __init_kthread_worker+0x70/0x70 [] ret_from_fork+0x3f/0x70 [] ? __init_kthread_worker+0x70/0x70 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/23] ALUA device handler update, part II
On 02/10/2016 05:37 PM, Martin K. Petersen wrote: "Bart" == Bart Van Assche writes: Bart> I will try to free up some time to help with reviewing and testing Bart> this patch series. But before I can do that the v4.5-rc multipath Bart> code needs to be stabilized first. See also Bart> https://www.redhat.com/archives/dm-devel/2016-February/msg00066.html. Hello Martin, I will start today with testing and reviewing this patch series. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/23] ALUA device handler update, part II
> "Bart" == Bart Van Assche writes: Bart, Bart> I will try to free up some time to help with reviewing and testing Bart> this patch series. But before I can do that the v4.5-rc multipath Bart> code needs to be stabilized first. See also Bart> https://www.redhat.com/archives/dm-devel/2016-February/msg00066.html. Have you had time to take a look? -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/23] ALUA device handler update, part II
On 02/08/2016 06:37 AM, Hannes Reinecke wrote: And for the impatient I've pushed the entire patchset to my kernel repository at kernel.org: kernel.org/pub/scm/linux/kernel/git/hare/scsi-devel.git alua-2.v5 Hello Hannes, I will try to free up some time to help with reviewing and testing this patch series. But before I can do that the v4.5-rc multipath code needs to be stabilized first. See also https://www.redhat.com/archives/dm-devel/2016-February/msg00066.html. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/23] ALUA device handler update, part II
On 02/08/2016 03:34 PM, Hannes Reinecke wrote: > as promised here is now the second part of my ALUA device handler update. > This contains a major rework of the ALUA device handler as execution is > moved onto a workqueue. This has the advantage that we avoid having to > do multiple calls to the same LUN (as happens frequently when failing > over a LUN with several paths) and finally retries are handled correctly. > As some arrays are only capable of handling one STPG at a time I've added > a blacklist flag which then uses a singlethreaded workqueue, thereby > effectively synchronize STPG handling. > Thanks to Bart for this suggestion. > > As usual, comments and reviews are welcome. > > Changes to v4: > - use kfree_rcu() as suggested by hch > - Use 'IS_ERR' instead of 'PTR_ERR' when checking for validity > of a pointer > - Simplify pg assignment as suggested by hch > - Use separate WARN_ON statements a suggested by hch > - Fixes to avoid I/O stall on failover > > Changes to v3: > - Use scsi_device flag for blacklisting as suggested by hch > - Add Arrays for synchronous ALUA handling > - Move synchronize_rcu() into release_port_group() > - Add remaining reviewed tags > > Changes to v2: > - Use a SCSI blacklist flag instead of a hardware handler parameter > for switching to synchronous ALUA handling > - Move scsi_get_device_flags{,_keyed} to scsi_devinfo.h > - Move flush_delayed_work() into release_port_group() > - Rename alua_lookup_pg() into alua_find_get_pg() > - Add __rcu annotations to keep sparse happy > > Changes to v1: > - Include reviews from hch > - Switch to hardware handler parameter instead of module option > > Hannes Reinecke (23): > scsi_dh_alua: Pass buffer as function argument > scsi_dh_alua: separate out alua_stpg() > scsi_dh_alua: Make stpg synchronous > scsi_dh_alua: call alua_rtpg() if stpg fails > scsi_dh_alua: switch to scsi_execute_req_flags() > scsi_dh_alua: allocate RTPG buffer separately > scsi_dh_alua: Use separate alua_port_group structure > scsi_dh_alua: use unique device id > scsi_dh_alua: simplify alua_initialize() > revert commit a8e5a2d593cb ("[SCSI] scsi_dh_alua: ALUA handler attach > should succeed while TPG is transitioning") > scsi_dh_alua: move optimize_stpg evaluation > scsi_dh_alua: remove 'rel_port' from alua_dh_data structure > scsi_dh_alua: Use workqueue for RTPG > scsi_dh_alua: Allow workqueue to run synchronously > scsi_dh_alua: Add new blacklist flag 'BLIST_SYNC_ALUA' > scsi_dh_alua: Recheck state on unit attention > scsi_dh_alua: update all port states > scsi_dh_alua: Send TEST UNIT READY to poll for transitioning > scsi_dh: add 'rescan' callback > scsi: Add 'access_state' attribute > scsi_dh_alua: use common definitions for ALUA state > scsi_dh_alua: update 'access_state' field > scsi_dh_alua: Update version to 2.0 > > drivers/scsi/device_handler/scsi_dh_alua.c | 988 > - > drivers/scsi/scsi_devinfo.c| 2 + > drivers/scsi/scsi_lib.c| 1 + > drivers/scsi/scsi_scan.c | 12 +- > drivers/scsi/scsi_sysfs.c | 49 ++ > include/scsi/scsi_device.h | 2 + > include/scsi/scsi_devinfo.h| 1 + > include/scsi/scsi_dh.h | 2 + > include/scsi/scsi_proto.h | 13 + > 9 files changed, 763 insertions(+), 307 deletions(-) > And for the impatient I've pushed the entire patchset to my kernel repository at kernel.org: kernel.org/pub/scm/linux/kernel/git/hare/scsi-devel.git alua-2.v5 Cheers, Hannes -- Dr. Hannes ReineckeTeamlead Storage & Networking h...@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/23] ALUA device handler update, part II
as promised here is now the second part of my ALUA device handler update. This contains a major rework of the ALUA device handler as execution is moved onto a workqueue. This has the advantage that we avoid having to do multiple calls to the same LUN (as happens frequently when failing over a LUN with several paths) and finally retries are handled correctly. As some arrays are only capable of handling one STPG at a time I've added a blacklist flag which then uses a singlethreaded workqueue, thereby effectively synchronize STPG handling. Thanks to Bart for this suggestion. As usual, comments and reviews are welcome. Changes to v4: - use kfree_rcu() as suggested by hch - Use 'IS_ERR' instead of 'PTR_ERR' when checking for validity of a pointer - Simplify pg assignment as suggested by hch - Use separate WARN_ON statements a suggested by hch - Fixes to avoid I/O stall on failover Changes to v3: - Use scsi_device flag for blacklisting as suggested by hch - Add Arrays for synchronous ALUA handling - Move synchronize_rcu() into release_port_group() - Add remaining reviewed tags Changes to v2: - Use a SCSI blacklist flag instead of a hardware handler parameter for switching to synchronous ALUA handling - Move scsi_get_device_flags{,_keyed} to scsi_devinfo.h - Move flush_delayed_work() into release_port_group() - Rename alua_lookup_pg() into alua_find_get_pg() - Add __rcu annotations to keep sparse happy Changes to v1: - Include reviews from hch - Switch to hardware handler parameter instead of module option Hannes Reinecke (23): scsi_dh_alua: Pass buffer as function argument scsi_dh_alua: separate out alua_stpg() scsi_dh_alua: Make stpg synchronous scsi_dh_alua: call alua_rtpg() if stpg fails scsi_dh_alua: switch to scsi_execute_req_flags() scsi_dh_alua: allocate RTPG buffer separately scsi_dh_alua: Use separate alua_port_group structure scsi_dh_alua: use unique device id scsi_dh_alua: simplify alua_initialize() revert commit a8e5a2d593cb ("[SCSI] scsi_dh_alua: ALUA handler attach should succeed while TPG is transitioning") scsi_dh_alua: move optimize_stpg evaluation scsi_dh_alua: remove 'rel_port' from alua_dh_data structure scsi_dh_alua: Use workqueue for RTPG scsi_dh_alua: Allow workqueue to run synchronously scsi_dh_alua: Add new blacklist flag 'BLIST_SYNC_ALUA' scsi_dh_alua: Recheck state on unit attention scsi_dh_alua: update all port states scsi_dh_alua: Send TEST UNIT READY to poll for transitioning scsi_dh: add 'rescan' callback scsi: Add 'access_state' attribute scsi_dh_alua: use common definitions for ALUA state scsi_dh_alua: update 'access_state' field scsi_dh_alua: Update version to 2.0 drivers/scsi/device_handler/scsi_dh_alua.c | 988 - drivers/scsi/scsi_devinfo.c| 2 + drivers/scsi/scsi_lib.c| 1 + drivers/scsi/scsi_scan.c | 12 +- drivers/scsi/scsi_sysfs.c | 49 ++ include/scsi/scsi_device.h | 2 + include/scsi/scsi_devinfo.h| 1 + include/scsi/scsi_dh.h | 2 + include/scsi/scsi_proto.h | 13 + 9 files changed, 763 insertions(+), 307 deletions(-) -- 1.8.5.6 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html