Re: [PATCH 00/23] ALUA device handler update, part II

2016-02-11 Thread Bart Van Assche
On 02/08/2016 06:34 AM, Hannes Reinecke wrote:
> as promised here is now the second part of my ALUA device handler update.

Hello Hannes,

Please test this patch series with lockdep enabled and fix the
resulting complaints. This is what was reported on my test setup
shortly after multipathd was started:


=
[ BUG: bad unlock balance detected! ]
4.5.0-rc3+ #6 Tainted: GE  
-
kworker/3:1/141 is trying to release lock (port_group_lock) at:
[] alua_rtpg+0x329/0x890 [scsi_dh_alua]
but there are no more locks to release!

other info that might help us debug this:
2 locks held by kworker/3:1/141:
 #0:  ("kaluad"){.+.+.+}, at: [] process_one_work+0x16a/0x480
 #1:  ((&(&pg->rtpg_work)->work)){+.+.+.}, at: [] 
process_one_work+0x16a/0x480

stack backtrace:
CPU: 3 PID: 141 Comm: kworker/3:1 Tainted: GE   4.5.0-rc3+ #6
Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
Workqueue: kaluad alua_rtpg_work [scsi_dh_alua]
  880456807978 81263ba7 0007
 0006 880456ff9f80 a041be19 8804568079a8
 810a29a9 8803f61b7bb8 880456ff9f80 a041d878
Call Trace:
 [] dump_stack+0x6b/0xa4
 [] ? alua_rtpg+0x329/0x890 [scsi_dh_alua]
 [] print_unlock_imbalance_bug+0xf9/0x100
 [] ? alua_rtpg+0x329/0x890 [scsi_dh_alua]
 [] __lock_release+0x25f/0x3a0
 [] ? __lock_release+0xc4/0x3a0
 [] ? alua_rtpg+0x329/0x890 [scsi_dh_alua]
 [] lock_release+0x39/0x60
 [] _raw_spin_unlock_irqrestore+0x29/0x60
 [] alua_rtpg+0x329/0x890 [scsi_dh_alua]
 [] ? alua_rtpg+0x3d5/0x890 [scsi_dh_alua]
 [] ? __lock_release+0xc4/0x3a0
 [] ? check_usage_forwards+0x100/0x100
 [] ? mark_held_locks+0x71/0x90
 [] ? _raw_spin_unlock_irqrestore+0x3b/0x60
 [] ? trace_hardirqs_on_caller+0xfc/0x1c0
 [] alua_rtpg_work+0x1be/0x370 [scsi_dh_alua]
 [] process_one_work+0x1da/0x480
 [] ? process_one_work+0x16a/0x480
 [] ? __lock_release+0xc4/0x3a0
 [] worker_thread+0x169/0x520
 [] ? complete+0x48/0x60
 [] ? _raw_spin_unlock_irqrestore+0x3b/0x60
 [] ? maybe_create_worker+0x110/0x110
 [] ? maybe_create_worker+0x110/0x110
 [] ? schedule+0x42/0xb0
 [] ? maybe_create_worker+0x110/0x110
 [] kthread+0xe4/0x100
 [] ? trace_hardirqs_on+0xd/0x10
 [] ? schedule_tail+0x19/0xd0
 [] ? __init_kthread_worker+0x70/0x70
 [] ret_from_fork+0x3f/0x70
 [] ? __init_kthread_worker+0x70/0x70
sd 13:0:0:1: alua: port group 101 state A preferred supports tOlUSNA
BUG: workqueue leaked lock or atomic: kworker/3:1/0x7ffe/141
 last function: alua_rtpg_work [scsi_dh_alua]
INFO: lockdep is turned off.
CPU: 3 PID: 141 Comm: kworker/3:1 Tainted: GE   4.5.0-rc3+ #6
Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
Workqueue: kaluad alua_rtpg_work [scsi_dh_alua]
  880456807c38 81263ba7 0001
  880457355960 880456ff9f80 880456807d28
 81072472 8107225a 810a6714 e8fffec71f05
Call Trace:
 [] dump_stack+0x6b/0xa4
 [] process_one_work+0x382/0x480
 [] ? process_one_work+0x16a/0x480
 [] ? __lock_release+0xc4/0x3a0
 [] worker_thread+0x169/0x520
 [] ? complete+0x48/0x60
 [] ? _raw_spin_unlock_irqrestore+0x3b/0x60
 [] ? maybe_create_worker+0x110/0x110
 [] ? maybe_create_worker+0x110/0x110
 [] ? schedule+0x42/0xb0
 [] ? maybe_create_worker+0x110/0x110
 [] kthread+0xe4/0x100
 [] ? trace_hardirqs_on+0xd/0x10
 [] ? schedule_tail+0x19/0xd0
 [] ? __init_kthread_worker+0x70/0x70
 [] ret_from_fork+0x3f/0x70
 [] ? __init_kthread_worker+0x70/0x70
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/23] ALUA device handler update, part II

2016-02-11 Thread Bart Van Assche

On 02/10/2016 05:37 PM, Martin K. Petersen wrote:

"Bart" == Bart Van Assche  writes:

Bart> I will try to free up some time to help with reviewing and testing
Bart> this patch series. But before I can do that the v4.5-rc multipath
Bart> code needs to be stabilized first. See also
Bart> https://www.redhat.com/archives/dm-devel/2016-February/msg00066.html.


Hello Martin,

I will start today with testing and reviewing this patch series.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/23] ALUA device handler update, part II

2016-02-10 Thread Martin K. Petersen
> "Bart" == Bart Van Assche  writes:

Bart,

Bart> I will try to free up some time to help with reviewing and testing
Bart> this patch series. But before I can do that the v4.5-rc multipath
Bart> code needs to be stabilized first. See also
Bart> https://www.redhat.com/archives/dm-devel/2016-February/msg00066.html.

Have you had time to take a look?

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/23] ALUA device handler update, part II

2016-02-08 Thread Bart Van Assche

On 02/08/2016 06:37 AM, Hannes Reinecke wrote:

And for the impatient I've pushed the entire patchset to my kernel
repository at kernel.org:

kernel.org/pub/scm/linux/kernel/git/hare/scsi-devel.git alua-2.v5


Hello Hannes,

I will try to free up some time to help with reviewing and testing this 
patch series. But before I can do that the v4.5-rc multipath code needs 
to be stabilized first. See also 
https://www.redhat.com/archives/dm-devel/2016-February/msg00066.html.


Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/23] ALUA device handler update, part II

2016-02-08 Thread Hannes Reinecke
On 02/08/2016 03:34 PM, Hannes Reinecke wrote:
> as promised here is now the second part of my ALUA device handler update.
> This contains a major rework of the ALUA device handler as execution is
> moved onto a workqueue. This has the advantage that we avoid having to
> do multiple calls to the same LUN (as happens frequently when failing
> over a LUN with several paths) and finally retries are handled correctly.
> As some arrays are only capable of handling one STPG at a time I've added
> a blacklist flag which then uses a singlethreaded workqueue, thereby
> effectively synchronize STPG handling.
> Thanks to Bart for this suggestion.
> 
> As usual, comments and reviews are welcome.
> 
> Changes to v4:
> - use kfree_rcu() as suggested by hch
> - Use 'IS_ERR' instead of 'PTR_ERR' when checking for validity
>   of a pointer
> - Simplify pg assignment as suggested by hch
> - Use separate WARN_ON statements a suggested by hch
> - Fixes to avoid I/O stall on failover
> 
> Changes to v3:
> - Use scsi_device flag for blacklisting as suggested by hch
> - Add Arrays for synchronous ALUA handling
> - Move synchronize_rcu() into release_port_group()
> - Add remaining reviewed tags
> 
> Changes to v2:
> - Use a SCSI blacklist flag instead of a hardware handler parameter
>   for switching to synchronous ALUA handling
> - Move scsi_get_device_flags{,_keyed} to scsi_devinfo.h
> - Move flush_delayed_work() into release_port_group()
> - Rename alua_lookup_pg() into alua_find_get_pg()
> - Add __rcu annotations to keep sparse happy
> 
> Changes to v1:
> - Include reviews from hch
> - Switch to hardware handler parameter instead of module option
> 
> Hannes Reinecke (23):
>   scsi_dh_alua: Pass buffer as function argument
>   scsi_dh_alua: separate out alua_stpg()
>   scsi_dh_alua: Make stpg synchronous
>   scsi_dh_alua: call alua_rtpg() if stpg fails
>   scsi_dh_alua: switch to scsi_execute_req_flags()
>   scsi_dh_alua: allocate RTPG buffer separately
>   scsi_dh_alua: Use separate alua_port_group structure
>   scsi_dh_alua: use unique device id
>   scsi_dh_alua: simplify alua_initialize()
>   revert commit a8e5a2d593cb ("[SCSI] scsi_dh_alua: ALUA handler attach
> should succeed while TPG is transitioning")
>   scsi_dh_alua: move optimize_stpg evaluation
>   scsi_dh_alua: remove 'rel_port' from alua_dh_data structure
>   scsi_dh_alua: Use workqueue for RTPG
>   scsi_dh_alua: Allow workqueue to run synchronously
>   scsi_dh_alua: Add new blacklist flag 'BLIST_SYNC_ALUA'
>   scsi_dh_alua: Recheck state on unit attention
>   scsi_dh_alua: update all port states
>   scsi_dh_alua: Send TEST UNIT READY to poll for transitioning
>   scsi_dh: add 'rescan' callback
>   scsi: Add 'access_state' attribute
>   scsi_dh_alua: use common definitions for ALUA state
>   scsi_dh_alua: update 'access_state' field
>   scsi_dh_alua: Update version to 2.0
> 
>  drivers/scsi/device_handler/scsi_dh_alua.c | 988 
> -
>  drivers/scsi/scsi_devinfo.c|   2 +
>  drivers/scsi/scsi_lib.c|   1 +
>  drivers/scsi/scsi_scan.c   |  12 +-
>  drivers/scsi/scsi_sysfs.c  |  49 ++
>  include/scsi/scsi_device.h |   2 +
>  include/scsi/scsi_devinfo.h|   1 +
>  include/scsi/scsi_dh.h |   2 +
>  include/scsi/scsi_proto.h  |  13 +
>  9 files changed, 763 insertions(+), 307 deletions(-)
> 
And for the impatient I've pushed the entire patchset to my kernel
repository at kernel.org:

kernel.org/pub/scm/linux/kernel/git/hare/scsi-devel.git alua-2.v5

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/23] ALUA device handler update, part II

2016-02-08 Thread Hannes Reinecke
as promised here is now the second part of my ALUA device handler update.
This contains a major rework of the ALUA device handler as execution is
moved onto a workqueue. This has the advantage that we avoid having to
do multiple calls to the same LUN (as happens frequently when failing
over a LUN with several paths) and finally retries are handled correctly.
As some arrays are only capable of handling one STPG at a time I've added
a blacklist flag which then uses a singlethreaded workqueue, thereby
effectively synchronize STPG handling.
Thanks to Bart for this suggestion.

As usual, comments and reviews are welcome.

Changes to v4:
- use kfree_rcu() as suggested by hch
- Use 'IS_ERR' instead of 'PTR_ERR' when checking for validity
  of a pointer
- Simplify pg assignment as suggested by hch
- Use separate WARN_ON statements a suggested by hch
- Fixes to avoid I/O stall on failover

Changes to v3:
- Use scsi_device flag for blacklisting as suggested by hch
- Add Arrays for synchronous ALUA handling
- Move synchronize_rcu() into release_port_group()
- Add remaining reviewed tags

Changes to v2:
- Use a SCSI blacklist flag instead of a hardware handler parameter
  for switching to synchronous ALUA handling
- Move scsi_get_device_flags{,_keyed} to scsi_devinfo.h
- Move flush_delayed_work() into release_port_group()
- Rename alua_lookup_pg() into alua_find_get_pg()
- Add __rcu annotations to keep sparse happy

Changes to v1:
- Include reviews from hch
- Switch to hardware handler parameter instead of module option

Hannes Reinecke (23):
  scsi_dh_alua: Pass buffer as function argument
  scsi_dh_alua: separate out alua_stpg()
  scsi_dh_alua: Make stpg synchronous
  scsi_dh_alua: call alua_rtpg() if stpg fails
  scsi_dh_alua: switch to scsi_execute_req_flags()
  scsi_dh_alua: allocate RTPG buffer separately
  scsi_dh_alua: Use separate alua_port_group structure
  scsi_dh_alua: use unique device id
  scsi_dh_alua: simplify alua_initialize()
  revert commit a8e5a2d593cb ("[SCSI] scsi_dh_alua: ALUA handler attach
should succeed while TPG is transitioning")
  scsi_dh_alua: move optimize_stpg evaluation
  scsi_dh_alua: remove 'rel_port' from alua_dh_data structure
  scsi_dh_alua: Use workqueue for RTPG
  scsi_dh_alua: Allow workqueue to run synchronously
  scsi_dh_alua: Add new blacklist flag 'BLIST_SYNC_ALUA'
  scsi_dh_alua: Recheck state on unit attention
  scsi_dh_alua: update all port states
  scsi_dh_alua: Send TEST UNIT READY to poll for transitioning
  scsi_dh: add 'rescan' callback
  scsi: Add 'access_state' attribute
  scsi_dh_alua: use common definitions for ALUA state
  scsi_dh_alua: update 'access_state' field
  scsi_dh_alua: Update version to 2.0

 drivers/scsi/device_handler/scsi_dh_alua.c | 988 -
 drivers/scsi/scsi_devinfo.c|   2 +
 drivers/scsi/scsi_lib.c|   1 +
 drivers/scsi/scsi_scan.c   |  12 +-
 drivers/scsi/scsi_sysfs.c  |  49 ++
 include/scsi/scsi_device.h |   2 +
 include/scsi/scsi_devinfo.h|   1 +
 include/scsi/scsi_dh.h |   2 +
 include/scsi/scsi_proto.h  |  13 +
 9 files changed, 763 insertions(+), 307 deletions(-)

-- 
1.8.5.6

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html