[PATCH RESEND v11 2/4] block: add runtime pm helpers

2013-03-15 Thread Aaron Lu
From: Lin Ming ming.m@intel.com

Add runtime pm helper functions:

void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
  - Initialization function for drivers to call.

int blk_pre_runtime_suspend(struct request_queue *q)
  - If any requests are in the queue, mark last busy and return -EBUSY.
Otherwise set q-rpm_status to RPM_SUSPENDING and return 0.

void blk_post_runtime_suspend(struct request_queue *q, int err)
  - If the suspend succeeded then set q-rpm_status to RPM_SUSPENDED.
Otherwise set it to RPM_ACTIVE and mark last busy.

void blk_pre_runtime_resume(struct request_queue *q)
  - Set q-rpm_status to RPM_RESUMING.

void blk_post_runtime_resume(struct request_queue *q, int err)
  - If the resume succeeded then set q-rpm_status to RPM_ACTIVE
and call __blk_run_queue, then mark last busy and autosuspend.
Otherwise set q-rpm_status to RPM_SUSPENDED.

The idea and API is designed by Alan Stern and described here:
http://marc.info/?l=linux-scsim=133727953625963w=2

Signed-off-by: Lin Ming ming.m@intel.com
Signed-off-by: Aaron Lu aaron...@intel.com
Acked-by: Alan Stern st...@rowland.harvard.edu
---
 block/blk-core.c   | 144 +
 include/linux/blkdev.h |  27 ++
 2 files changed, 171 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 074b758..123d240 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -30,6 +30,7 @@
 #include linux/list_sort.h
 #include linux/delay.h
 #include linux/ratelimit.h
+#include linux/pm_runtime.h
 
 #define CREATE_TRACE_POINTS
 #include trace/events/block.h
@@ -3045,6 +3046,149 @@ void blk_finish_plug(struct blk_plug *plug)
 }
 EXPORT_SYMBOL(blk_finish_plug);
 
+#ifdef CONFIG_PM_RUNTIME
+/**
+ * blk_pm_runtime_init - Block layer runtime PM initialization routine
+ * @q: the queue of the device
+ * @dev: the device the queue belongs to
+ *
+ * Description:
+ *Initialize runtime-PM-related fields for @q and start auto suspend for
+ *@dev. Drivers that want to take advantage of request-based runtime PM
+ *should call this function after @dev has been initialized, and its
+ *request queue @q has been allocated, and runtime PM for it can not happen
+ *yet(either due to disabled/forbidden or its usage_count  0). In most
+ *cases, driver should call this function before any I/O has taken place.
+ *
+ *This function takes care of setting up using auto suspend for the device,
+ *the autosuspend delay is set to -1 to make runtime suspend impossible
+ *until an updated value is either set by user or by driver. Drivers do
+ *not need to touch other autosuspend settings.
+ *
+ *The block layer runtime PM is request based, so only works for drivers
+ *that use request as their IO unit instead of those directly use bio's.
+ */
+void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
+{
+   q-dev = dev;
+   q-rpm_status = RPM_ACTIVE;
+   pm_runtime_set_autosuspend_delay(q-dev, -1);
+   pm_runtime_use_autosuspend(q-dev);
+}
+EXPORT_SYMBOL(blk_pm_runtime_init);
+
+/**
+ * blk_pre_runtime_suspend - Pre runtime suspend check
+ * @q: the queue of the device
+ *
+ * Description:
+ *This function will check if runtime suspend is allowed for the device
+ *by examining if there are any requests pending in the queue. If there
+ *are requests pending, the device can not be runtime suspended; otherwise,
+ *the queue's status will be updated to SUSPENDING and the driver can
+ *proceed to suspend the device.
+ *
+ *For the not allowed case, we mark last busy for the device so that
+ *runtime PM core will try to autosuspend it some time later.
+ *
+ *This function should be called near the start of the device's
+ *runtime_suspend callback.
+ *
+ * Return:
+ *0- OK to runtime suspend the device
+ *-EBUSY   - Device should not be runtime suspended
+ */
+int blk_pre_runtime_suspend(struct request_queue *q)
+{
+   int ret = 0;
+
+   spin_lock_irq(q-queue_lock);
+   if (q-nr_pending) {
+   ret = -EBUSY;
+   pm_runtime_mark_last_busy(q-dev);
+   } else {
+   q-rpm_status = RPM_SUSPENDING;
+   }
+   spin_unlock_irq(q-queue_lock);
+   return ret;
+}
+EXPORT_SYMBOL(blk_pre_runtime_suspend);
+
+/**
+ * blk_post_runtime_suspend - Post runtime suspend processing
+ * @q: the queue of the device
+ * @err: return value of the device's runtime_suspend function
+ *
+ * Description:
+ *Update the queue's runtime status according to the return value of the
+ *device's runtime suspend function and mark last busy for the device so
+ *that PM core will try to auto suspend the device at a later time.
+ *
+ *This function should be called near the end of the device's
+ *runtime_suspend callback.
+ */
+void blk_post_runtime_suspend(struct request_queue *q, int err)
+{
+   

[PATCH RESEND v11 4/4] sd: change to auto suspend mode

2013-03-15 Thread Aaron Lu
From: Lin Ming ming.m@intel.com

Uses block layer runtime pm helper functions in
scsi_runtime_suspend/resume for devices that take advantage of it.

Remove scsi_autopm_* from sd open/release path and check_events path.

Signed-off-by: Lin Ming ming.m@intel.com
Signed-off-by: Aaron Lu aaron...@intel.com
Acked-by: Alan Stern st...@rowland.harvard.edu
---
 drivers/scsi/scsi_pm.c | 84 ++
 drivers/scsi/sd.c  | 13 +---
 2 files changed, 72 insertions(+), 25 deletions(-)

diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
index 8f6b12c..42539ee 100644
--- a/drivers/scsi/scsi_pm.c
+++ b/drivers/scsi/scsi_pm.c
@@ -144,33 +144,83 @@ static int scsi_bus_restore(struct device *dev)
 
 #ifdef CONFIG_PM_RUNTIME
 
+static int sdev_blk_runtime_suspend(struct scsi_device *sdev,
+   int (*cb)(struct device *))
+{
+   int err;
+
+   err = blk_pre_runtime_suspend(sdev-request_queue);
+   if (err)
+   return err;
+   if (cb)
+   err = cb(sdev-sdev_gendev);
+   blk_post_runtime_suspend(sdev-request_queue, err);
+
+   return err;
+}
+
+static int sdev_runtime_suspend(struct device *dev)
+{
+   const struct dev_pm_ops *pm = dev-driver ? dev-driver-pm : NULL;
+   int (*cb)(struct device *) = pm ? pm-runtime_suspend : NULL;
+   struct scsi_device *sdev = to_scsi_device(dev);
+   int err;
+
+   if (sdev-request_queue-dev)
+   return sdev_blk_runtime_suspend(sdev, cb);
+
+   err = scsi_dev_type_suspend(dev, cb);
+   if (err == -EAGAIN)
+   pm_schedule_suspend(dev, jiffies_to_msecs(
+   round_jiffies_up_relative(HZ/10)));
+   return err;
+}
+
 static int scsi_runtime_suspend(struct device *dev)
 {
int err = 0;
-   const struct dev_pm_ops *pm = dev-driver ? dev-driver-pm : NULL;
 
dev_dbg(dev, scsi_runtime_suspend\n);
-   if (scsi_is_sdev_device(dev)) {
-   err = scsi_dev_type_suspend(dev,
-   pm ? pm-runtime_suspend : NULL);
-   if (err == -EAGAIN)
-   pm_schedule_suspend(dev, jiffies_to_msecs(
-   round_jiffies_up_relative(HZ/10)));
-   }
+   if (scsi_is_sdev_device(dev))
+   err = sdev_runtime_suspend(dev);
 
/* Insert hooks here for targets, hosts, and transport classes */
 
return err;
 }
 
-static int scsi_runtime_resume(struct device *dev)
+static int sdev_blk_runtime_resume(struct scsi_device *sdev,
+   int (*cb)(struct device *))
 {
int err = 0;
+
+   blk_pre_runtime_resume(sdev-request_queue);
+   if (cb)
+   err = cb(sdev-sdev_gendev);
+   blk_post_runtime_resume(sdev-request_queue, err);
+
+   return err;
+}
+
+static int sdev_runtime_resume(struct device *dev)
+{
+   struct scsi_device *sdev = to_scsi_device(dev);
const struct dev_pm_ops *pm = dev-driver ? dev-driver-pm : NULL;
+   int (*cb)(struct device *) = pm ? pm-runtime_resume : NULL;
+
+   if (sdev-request_queue-dev)
+   return sdev_blk_runtime_resume(sdev, cb);
+   else
+   return scsi_dev_type_resume(dev, cb);
+}
+
+static int scsi_runtime_resume(struct device *dev)
+{
+   int err = 0;
 
dev_dbg(dev, scsi_runtime_resume\n);
if (scsi_is_sdev_device(dev))
-   err = scsi_dev_type_resume(dev, pm ? pm-runtime_resume : NULL);
+   err = sdev_runtime_resume(dev);
 
/* Insert hooks here for targets, hosts, and transport classes */
 
@@ -185,10 +235,18 @@ static int scsi_runtime_idle(struct device *dev)
 
/* Insert hooks here for targets, hosts, and transport classes */
 
-   if (scsi_is_sdev_device(dev))
-   err = pm_schedule_suspend(dev, 100);
-   else
+   if (scsi_is_sdev_device(dev)) {
+   struct scsi_device *sdev = to_scsi_device(dev);
+
+   if (sdev-request_queue-dev) {
+   pm_runtime_mark_last_busy(dev);
+   err = pm_runtime_autosuspend(dev);
+   } else {
+   err = pm_runtime_suspend(dev);
+   }
+   } else {
err = pm_runtime_suspend(dev);
+   }
return err;
 }
 
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index c6e2b34..5000bec 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1121,10 +1121,6 @@ static int sd_open(struct block_device *bdev, fmode_t 
mode)
 
sdev = sdkp-device;
 
-   retval = scsi_autopm_get_device(sdev);
-   if (retval)
-   goto error_autopm;
-
/*
 * If the device is in error recovery, wait until it is done.
 * If the device is offline, then disallow any access to it.
@@ -1169,8 +1165,6 @@ static int sd_open(struct block_device *bdev, fmode_t 

[PATCH RESEND v11 3/4] block: implement runtime pm strategy

2013-03-15 Thread Aaron Lu
From: Lin Ming ming.m@intel.com

When a request is added:
If device is suspended or is suspending and the request is not a
PM request, resume the device.

When the last request finishes:
Call pm_runtime_mark_last_busy().

When pick a request:
If device is resuming/suspending, then only PM request is allowed
to go.

The idea and API is designed by Alan Stern and described here:
http://marc.info/?l=linux-scsim=133727953625963w=2

Signed-off-by: Lin Ming ming.m@intel.com
Signed-off-by: Aaron Lu aaron...@intel.com
Acked-by: Alan Stern st...@rowland.harvard.edu
---
 block/blk-core.c | 39 +++
 block/elevator.c | 26 ++
 2 files changed, 65 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 123d240..441f348 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1264,6 +1264,16 @@ void part_round_stats(int cpu, struct hd_struct *part)
 }
 EXPORT_SYMBOL_GPL(part_round_stats);
 
+#ifdef CONFIG_PM_RUNTIME
+static void blk_pm_put_request(struct request *rq)
+{
+   if (rq-q-dev  !(rq-cmd_flags  REQ_PM)  !--rq-q-nr_pending)
+   pm_runtime_mark_last_busy(rq-q-dev);
+}
+#else
+static inline void blk_pm_put_request(struct request *rq) {}
+#endif
+
 /*
  * queue lock must be held
  */
@@ -1274,6 +1284,8 @@ void __blk_put_request(struct request_queue *q, struct 
request *req)
if (unlikely(--req-ref_count))
return;
 
+   blk_pm_put_request(req);
+
elv_completed_request(q, req);
 
/* this is a bio leak */
@@ -2053,6 +2065,28 @@ static void blk_account_io_done(struct request *req)
}
 }
 
+#ifdef CONFIG_PM_RUNTIME
+/*
+ * Don't process normal requests when queue is suspended
+ * or in the process of suspending/resuming
+ */
+static struct request *blk_pm_peek_request(struct request_queue *q,
+  struct request *rq)
+{
+   if (q-dev  (q-rpm_status == RPM_SUSPENDED ||
+   (q-rpm_status != RPM_ACTIVE  !(rq-cmd_flags  REQ_PM
+   return NULL;
+   else
+   return rq;
+}
+#else
+static inline struct request *blk_pm_peek_request(struct request_queue *q,
+ struct request *rq)
+{
+   return rq;
+}
+#endif
+
 /**
  * blk_peek_request - peek at the top of a request queue
  * @q: request queue to peek at
@@ -2075,6 +2109,11 @@ struct request *blk_peek_request(struct request_queue *q)
int ret;
 
while ((rq = __elv_next_request(q)) != NULL) {
+
+   rq = blk_pm_peek_request(q, rq);
+   if (!rq)
+   break;
+
if (!(rq-cmd_flags  REQ_STARTED)) {
/*
 * This is the first time the device driver
diff --git a/block/elevator.c b/block/elevator.c
index a0ffdd9..eba5b04 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -34,6 +34,7 @@
 #include linux/blktrace_api.h
 #include linux/hash.h
 #include linux/uaccess.h
+#include linux/pm_runtime.h
 
 #include trace/events/block.h
 
@@ -536,6 +537,27 @@ void elv_bio_merged(struct request_queue *q, struct 
request *rq,
e-type-ops.elevator_bio_merged_fn(q, rq, bio);
 }
 
+#ifdef CONFIG_PM_RUNTIME
+static void blk_pm_requeue_request(struct request *rq)
+{
+   if (rq-q-dev  !(rq-cmd_flags  REQ_PM))
+   rq-q-nr_pending--;
+}
+
+static void blk_pm_add_request(struct request_queue *q, struct request *rq)
+{
+   if (q-dev  !(rq-cmd_flags  REQ_PM)  q-nr_pending++ == 0 
+   (q-rpm_status == RPM_SUSPENDED || q-rpm_status == RPM_SUSPENDING))
+   pm_request_resume(q-dev);
+}
+#else
+static inline void blk_pm_requeue_request(struct request *rq) {}
+static inline void blk_pm_add_request(struct request_queue *q,
+ struct request *rq)
+{
+}
+#endif
+
 void elv_requeue_request(struct request_queue *q, struct request *rq)
 {
/*
@@ -550,6 +572,8 @@ void elv_requeue_request(struct request_queue *q, struct 
request *rq)
 
rq-cmd_flags = ~REQ_STARTED;
 
+   blk_pm_requeue_request(rq);
+
__elv_add_request(q, rq, ELEVATOR_INSERT_REQUEUE);
 }
 
@@ -572,6 +596,8 @@ void __elv_add_request(struct request_queue *q, struct 
request *rq, int where)
 {
trace_block_rq_insert(q, rq);
 
+   blk_pm_add_request(q, rq);
+
rq-q = q;
 
if (rq-cmd_flags  REQ_SOFTBARRIER) {
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RESEND v11 0/4] block layer runtime pm

2013-03-15 Thread Aaron Lu
In August 2010, Jens and Alan discussed about Runtime PM and the block
layer. http://marc.info/?t=12825910841r=1w=2
And then Alan has given a detailed implementation guide:
http://marc.info/?l=linux-scsim=133727953625963w=2

To test:
# ls -l /sys/block/sda
/sys/devices/pci:00/:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda

# echo 1  
/sys/devices/pci:00/:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/power/autosuspend_delay_ms
# echo auto  
/sys/devices/pci:00/:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/power/control
Then you'll see sda is suspended after 10secs idle.

[ 1767.680192] sd 2:0:0:0: [sda] Synchronizing SCSI cache
[ 1767.680317] sd 2:0:0:0: [sda] Stopping disk

And if you do some IO, it will resume immediately.
[ 1791.052438] sd 2:0:0:0: [sda] Starting disk

For test, I often set the autosuspend time to 1 second. If you are using
a GUI, the 10 seconds delay may be too long that the disk can not enter
runtime suspended state.

Note that sd's runtime suspend callback will dump some kernel messages
and the syslog daemon will write kernel message to /var/log/messages,
making the disk instantly resume after suspended. So for test, the
syslog daemon should better be temporarily stopped.

A git repo for it, on top of v3.9-rc1:
https://github.com/aaronlu/linux.git blockpm

v11:
- Add Alan Stern's Acked-by tag.

v10:
- Add link of Alan Stern's ideas on block layer runtime PM to patch 2
  and 3's changelog;
- Add back code to schdule device suspend if scsi driver return -EBUSY.

v9:
- No need to mark last busy and autosuspend in blk_pm_runtime_init as
  suggested by Alan Stern;
- mark last busy in blk_runtime_post_suspend if driver failed to runtime
  suspend the device, so that PM core can try to autosuspend it some
  time later;
- Update scsi bus layer runtime callback to handle scsi devices which
  use request based runtime PM and which don't.

v8:
- Set default autosuspend delay to -1 to avoid suspend till an updated
  value is set as suggested by Alan Stern;
- Always check the dev field of the queue structure, as it is incorrect
  and meaningless to do any operation on devices that do not use block
  layer runtime PM as reminded by Alan Stern;
- Update scsi bus level runtime PM callback to take care of scsi devices
  that use block layer runtime PM and that don't.

v7:
- Add kernel doc for block layer runtime PM API as suggested by
  Alan Stern;

- Add back check for q-dev, as that serves as a flag if driver
  is using block layer runtime PM;

- Do not auto suspend when last request is finished, as that's a hot
  path and auto suspend is not a trivial function. Instead, mark last
  busy in pre_suspend so that runtim PM core will retry suspend some
  time later to solve the 1st problem demostrated in v6, suggested by
  Alan Stern.

- Move block layer runtime PM strtegy functions to where they are
  needed instead of in include/linux/blkdev.h as suggested by Alan
  Stern since clients of block layer do not need to know those
  functions.

v6:
Take over from Lin Ming.

- Instead of put the device into autosuspend state in
  blk_post_runtime_suspend, do it when the last request is finished.
  This can also solve the problem illustrated below:

  thread Athread B
|suspend timer expired  |
|  ... ...  |a new request comes in,
|  ... ...  |blk_pm_add_request
|  ... ...  |skip request_resume due to
|  ... ...  |q-status is still RPM_ACTIVE
|  rpm_suspend  |  ... ...
|scsi_runtime_suspend   |  ... ...
|  blk_pre_runtime_suspend  |  ... ...
|  return -EBUSY due to nr_pending  |  ... ...
|  rpm_suspend done |  ... ...
|   |blk_pm_put_request, mark last busy

But no more trigger point, and the device will stay at RPM_ACTIVE state.
Run pm_runtime_autosuspend after the last request is finished solved
this problem.

- Requests which have the REQ_PM flag should not involve nr_pending
  counting, or we may lose the condition to resume the device:
  Suppose queue is active and nr_pending is 0. Then a REQ_PM request
  comes and nr_pending will be increased to 1, but since the request has
  REQ_PM flag, it will not cause resume. Before it is finished, a normal
  request comes in, and since nr_pending is 1 now, it will not trigger
  the resume of the device either. Bug.

- Do not quiesce the device in scsi bus level runtime suspend callback.
  Since the only reason the device is to be runtime suspended is due to
  no more requests pending for it, quiesce it is pointless.

- Remove scsi_autopm_* from sd_check_events as we are request driven.

- Call blk_pm_runtime_init in scsi_sysfs_initialize_dev, so that we do
  not need to check queue's device in blk_pm_add/put_request.

- Do not mark last busy and initiate an 

[PATCH RESEND v11 1/4] block: add a flag to identify PM request

2013-03-15 Thread Aaron Lu
From: Lin Ming ming.m@intel.com

Add a flag REQ_PM to identify the request is PM related, such requests
will not change the device request queue's runtime status. It is
intended to be used in driver's runtime PM callback, so that driver can
perform some IO to the device there with the queue's runtime status
unaffected. e.g. in SCSI disk's runtime suspend callback, the disk will
be put into stopped power state, and this require sending a command to
the device. Such command processing should not change the disk's runtime
status.

As an example, modify scsi code to use this flag.

Signed-off-by: Lin Ming ming.m@intel.com
Signed-off-by: Aaron Lu aaron...@intel.com
Acked-by: Alan Stern st...@rowland.harvard.edu
---
 drivers/scsi/scsi_lib.c|  9 -
 drivers/scsi/sd.c  |  9 +
 include/linux/blk_types.h  |  2 ++
 include/scsi/scsi_device.h | 16 
 4 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 765398c..23f795f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -271,11 +271,10 @@ int scsi_execute(struct scsi_device *sdev, const unsigned 
char *cmd,
 }
 EXPORT_SYMBOL(scsi_execute);
 
-
-int scsi_execute_req(struct scsi_device *sdev, const unsigned char *cmd,
+int scsi_execute_req_flags(struct scsi_device *sdev, const unsigned char *cmd,
 int data_direction, void *buffer, unsigned bufflen,
 struct scsi_sense_hdr *sshdr, int timeout, int retries,
-int *resid)
+int *resid, int flags)
 {
char *sense = NULL;
int result;
@@ -286,14 +285,14 @@ int scsi_execute_req(struct scsi_device *sdev, const 
unsigned char *cmd,
return DRIVER_ERROR  24;
}
result = scsi_execute(sdev, cmd, data_direction, buffer, bufflen,
- sense, timeout, retries, 0, resid);
+ sense, timeout, retries, flags, resid);
if (sshdr)
scsi_normalize_sense(sense, SCSI_SENSE_BUFFERSIZE, sshdr);
 
kfree(sense);
return result;
 }
-EXPORT_SYMBOL(scsi_execute_req);
+EXPORT_SYMBOL(scsi_execute_req_flags);
 
 /*
  * Function:scsi_init_cmd_errh()
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 7992635..c6e2b34 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1424,8 +1424,9 @@ static int sd_sync_cache(struct scsi_disk *sdkp)
 * Leave the rest of the command zero to indicate
 * flush everything.
 */
-   res = scsi_execute_req(sdp, cmd, DMA_NONE, NULL, 0, sshdr,
-  SD_FLUSH_TIMEOUT, SD_MAX_RETRIES, NULL);
+   res = scsi_execute_req_flags(sdp, cmd, DMA_NONE, NULL, 0,
+sshdr, SD_FLUSH_TIMEOUT,
+SD_MAX_RETRIES, NULL, REQ_PM);
if (res == 0)
break;
}
@@ -3021,8 +3022,8 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, 
int start)
if (!scsi_device_online(sdp))
return -ENODEV;
 
-   res = scsi_execute_req(sdp, cmd, DMA_NONE, NULL, 0, sshdr,
-  SD_TIMEOUT, SD_MAX_RETRIES, NULL);
+   res = scsi_execute_req_flags(sdp, cmd, DMA_NONE, NULL, 0, sshdr,
+  SD_TIMEOUT, SD_MAX_RETRIES, NULL, REQ_PM);
if (res) {
sd_printk(KERN_WARNING, sdkp, START_STOP FAILED\n);
sd_print_result(sdkp, res);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index cdf1119..fcc1ce2 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -175,6 +175,7 @@ enum rq_flag_bits {
__REQ_IO_STAT,  /* account I/O stat */
__REQ_MIXED_MERGE,  /* merge of different types, fail separately */
__REQ_KERNEL,   /* direct IO to kernel pages */
+   __REQ_PM,   /* runtime pm request */
__REQ_NR_BITS,  /* stops here */
 };
 
@@ -223,5 +224,6 @@ enum rq_flag_bits {
 #define REQ_MIXED_MERGE(1  __REQ_MIXED_MERGE)
 #define REQ_SECURE (1  __REQ_SECURE)
 #define REQ_KERNEL (1  __REQ_KERNEL)
+#define REQ_PM (1  __REQ_PM)
 
 #endif /* __LINUX_BLK_TYPES_H */
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index a7f9cba..cc64587 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -394,10 +394,18 @@ extern int scsi_execute(struct scsi_device *sdev, const 
unsigned char *cmd,
int data_direction, void *buffer, unsigned bufflen,
unsigned char *sense, int timeout, int retries,
int flag, int *resid);
-extern int scsi_execute_req(struct scsi_device *sdev, const unsigned char *cmd,
- 

Re: [PATCH v2] st: Take additional queue ref in st_probe

2013-03-15 Thread Jan Vesely


On 05/03/13 16:57, Joe Lawrence wrote:
 Changes from v1:
   Corrected error paths as noted by Ewan Milne and Jan Vesely.

thanks,

Acked-by: Jan Vesely jves...@redhat.com

 
 These changes were applied to scsi.git, branch misc.  This patch
 fixes a reference count bug in the SCSI tape driver which can be
 reproduced with the following:
 
 * Boot with slub_debug=FZPU, tape drive attached
 * echo 1  /sys/devices/... tape device pci path .../remove
 * Wait for device removal
 * echo 1  /sys/kernel/slab/blkdev_queue/validate
 * Slub debug complains about corrupted poison pattern
 
 In commit 523e1d39 (block: make gendisk hold a reference to its queue) 
 add_disk() and disk_release() were modified to get/put an additional
 reference on a disk queue to fix a reference counting discrepency
 between bdev release and SCSI device removal.  The ST driver never
 calls add_disk(), so this commit introduced an extra kref put when the
 ST driver frees its struct gendisk.
 
 Attempts were made to fix this bug at the block level [1] but later
 abandoned due to floppy driver issues [2].
 
 [1] https://lkml.org/lkml/2012/8/27/354
 [2] https://lkml.org/lkml/2012/9/22/113
 
 From a50a6ee28748b7c1620af6f76772164ec0fc4a1d Mon Sep 17 00:00:00 2001
 From: Joe Lawrence joe.lawre...@stratus.com
 Date: Tue, 5 Mar 2013 09:30:14 -0500
 Subject: [PATCH v2] st: Take additional queue ref in st_probe
 MIME-Version: 1.0
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: 8bit
 
 The SCSI tape driver employs a struct gendisk, calling alloc_disk() to
 create an instance, but does not register it via add_disk().  When the
 gendisk is torn down, disk_release() is called and expects to return a
 disk queue reference that add_disk() normally would have taken out. (See
 commit 523e1d39.)  Fix the kref accounting by adding a blk_get_queue()
 to st_probe().
 
 Signed-off-by: Joe Lawrence joe.lawre...@stratus.com
 Tested-by: Ewan D. Milne emi...@redhat.com 
 Cc: Kai Mäkisara kai.makis...@kolumbus.fi
 Cc: James E.J. Bottomley jbottom...@parallels.com
 Cc: Seymour, Shane M shane.seym...@hp.com
 Cc: Jan Vesely jves...@redhat.com
 Cc: linux-scsi@vger.kernel.org
 ---
  drivers/scsi/st.c | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
 index 98156a9..96f3363 100644
 --- a/drivers/scsi/st.c
 +++ b/drivers/scsi/st.c
 @@ -4112,6 +4112,10 @@ static int st_probe(struct device *dev)
   tpnt-disk = disk;
   disk-private_data = tpnt-driver;
   disk-queue = SDp-request_queue;
 + /* SCSI tape doesn't register this gendisk via add_disk().  Manually
 +  * take queue reference that release_disk() expects. */
 + if (!blk_get_queue(disk-queue))
 + goto out_put_disk;
   tpnt-driver = st_template;
  
   tpnt-device = SDp;
 @@ -4181,7 +4185,7 @@ static int st_probe(struct device *dev)
   if (!idr_pre_get(st_index_idr, GFP_KERNEL)) {
   pr_warn(st: idr expansion failed\n);
   error = -ENOMEM;
 - goto out_put_disk;
 + goto out_put_queue;
   }
  
   spin_lock(st_index_lock);
 @@ -4189,7 +4193,7 @@ static int st_probe(struct device *dev)
   spin_unlock(st_index_lock);
   if (error) {
   pr_warn(st: idr allocation failed: %d\n, error);
 - goto out_put_disk;
 + goto out_put_queue;
   }
  
   if (dev_num  ST_MAX_TAPES) {
 @@ -4222,6 +4226,8 @@ out_put_index:
   spin_lock(st_index_lock);
   idr_remove(st_index_idr, dev_num);
   spin_unlock(st_index_lock);
 +out_put_queue:
 + blk_put_queue(disk-queue);
  out_put_disk:
   put_disk(disk);
   kfree(tpnt);
 

-- 
Jan Vesely jves...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -next] target: fix possible memory leak in core_tpg_register()

2013-03-15 Thread Wei Yongjun
From: Wei Yongjun yongjun_...@trendmicro.com.cn

'se_tpg-tpg_lun_list' is malloced in core_tpg_register() and should be freed
before leaving from the error handling cases, otherwise it will cause memory
leak. 
'se_tpg' is malloced out of this function, and will be freed if we return 
error, so
remove free for 'se_tpg'.

Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn
---
 drivers/target/target_core_tpg.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/target/target_core_tpg.c b/drivers/target/target_core_tpg.c
index 9169d6a..aac9d27 100644
--- a/drivers/target/target_core_tpg.c
+++ b/drivers/target/target_core_tpg.c
@@ -711,7 +711,8 @@ int core_tpg_register(
 
if (se_tpg-se_tpg_type == TRANSPORT_TPG_TYPE_NORMAL) {
if (core_tpg_setup_virtual_lun0(se_tpg)  0) {
-   kfree(se_tpg);
+   array_free(se_tpg-tpg_lun_list,
+  TRANSPORT_MAX_LUNS_PER_TPG);
return -ENOMEM;
}
}


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][RFC] scsi: Use W_LUN for scanning

2013-03-15 Thread Hannes Reinecke
SAM advertises the use of a Well-known LUN (W_LUN) for scanning.
As this avoids exposing LUN 0 (which might be a valid LUN) for
all initiators it is the preferred method for LUN scanning on
some arrays.
So we should be using W_LUN for scanning, too. If the W_LUN is
not supported we'll fall back to use LUN 0.
For broken W_LUN implementations a new blacklist flag
'BLIST_NO_WLUN' is added.

Signed-off-by: Hannes Reinecke h...@suse.de

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 3e58b22..f4ccdea 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1312,6 +1312,7 @@ static int scsi_report_lun_scan(struct scsi_target 
*starget, int bflags,
unsigned int num_luns;
unsigned int retries;
int result;
+   int w_lun = SCSI_W_LUN_REPORT_LUNS;
struct scsi_lun *lunp, *lun_data;
u8 *data;
struct scsi_sense_hdr sshdr;
@@ -1337,11 +1338,20 @@ static int scsi_report_lun_scan(struct scsi_target 
*starget, int bflags,
return 0;
if (starget-no_report_luns)
return 1;
+   if (bflags  BLIST_NO_WLUN)
+   w_lun = 0;
 
+retry_report_lun_scan:
if (!(sdev = scsi_device_lookup_by_target(starget, 0))) {
-   sdev = scsi_alloc_sdev(starget, 0, NULL);
-   if (!sdev)
-   return 0;
+   sdev = scsi_alloc_sdev(starget, w_lun, NULL);
+   if (!sdev) {
+   if (w_lun != 0) {
+   w_lun = 0;
+   sdev = scsi_alloc_sdev(starget, w_lun, NULL);
+   }
+   if (!sdev)
+   return 0;
+   }
if (scsi_device_get(sdev)) {
__scsi_remove_device(sdev);
return 0;
@@ -1418,6 +1428,18 @@ static int scsi_report_lun_scan(struct scsi_target 
*starget, int bflags,
}
 
if (result) {
+   if (w_lun != 0  scsi_device_created(sdev)) {
+   /*
+* W_LUN probably not supported, try with LUN 0
+*/
+   SCSI_LOG_SCAN_BUS(3, printk (KERN_INFO scsi scan:
+   W_LUN not supported, try LUN 0\n));
+   kfree(lun_data);
+   scsi_device_put(sdev);
+   __scsi_remove_device(sdev);
+   w_lun = 0;
+   goto retry_report_lun_scan;
+   }
/*
 * The device probably does not support a REPORT LUN command
 */
diff --git a/include/scsi/scsi_devinfo.h b/include/scsi/scsi_devinfo.h
index cc1f3e7..ffb42b1 100644
--- a/include/scsi/scsi_devinfo.h
+++ b/include/scsi/scsi_devinfo.h
@@ -31,4 +31,5 @@
 #define BLIST_MAX_512  0x80 /* maximum 512 sector cdb length */
 #define BLIST_ATTACH_PQ3   0x100 /* Scan: Attach to PQ3 devices */
 #define BLIST_NO_DIF   0x200 /* Disable T10 PI (DIF) */
+#define BLIST_NO_WLUN  0x400 /* Disable W_LUN scanning */
 #endif
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -next] [SCSI] csiostor: remove unused variable in csio_process_fwevtq_entry()

2013-03-15 Thread Wei Yongjun
From: Wei Yongjun yongjun_...@trendmicro.com.cn

The variable 'data' is initialized but never used
otherwise, so remove the unused variable.

Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn
---
 drivers/scsi/csiostor/csio_hw.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/drivers/scsi/csiostor/csio_hw.c b/drivers/scsi/csiostor/csio_hw.c
index a0b4c89..1936055 100644
--- a/drivers/scsi/csiostor/csio_hw.c
+++ b/drivers/scsi/csiostor/csio_hw.c
@@ -3565,7 +3565,6 @@ csio_process_fwevtq_entry(struct csio_hw *hw, void *wr, 
uint32_t len,
  struct csio_fl_dma_buf *flb, void *priv)
 {
__u8 op;
-   __be64 *data;
void *msg = NULL;
uint32_t msg_len = 0;
bool msg_sg = 0;
@@ -3581,8 +3580,6 @@ csio_process_fwevtq_entry(struct csio_hw *hw, void *wr, 
uint32_t len,
msg = (void *) flb;
msg_len = flb-totlen;
msg_sg = 1;
-
-   data = (__be64 *) msg;
} else if (op == CPL_FW6_MSG || op == CPL_FW4_MSG) {
 
CSIO_INC_STATS(hw, n_cpl_fw6_msg);
@@ -3590,8 +3587,6 @@ csio_process_fwevtq_entry(struct csio_hw *hw, void *wr, 
uint32_t len,
msg = (void *)((uintptr_t)wr + sizeof(__be64));
msg_len = (op == CPL_FW6_MSG) ? sizeof(struct cpl_fw6_msg) :
   sizeof(struct cpl_fw4_msg);
-
-   data = (__be64 *) msg;
} else {
csio_warn(hw, unexpected CPL %#x on FW event queue\n, op);
CSIO_INC_STATS(hw, n_cpl_unexp);


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next] [SCSI] csiostor: fix error return code in csio_hw_init()

2013-03-15 Thread Naresh Kumar Inna
On 3/15/2013 2:55 PM, Wei Yongjun wrote:
 From: Wei Yongjun yongjun_...@trendmicro.com.cn
 
 Fix to return a negative error code from the error handling
 case instead of 0, as returned elsewhere in this function.
 
 Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn
 ---
  drivers/scsi/csiostor/csio_hw.c | 1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/drivers/scsi/csiostor/csio_hw.c b/drivers/scsi/csiostor/csio_hw.c
 index a0b4c89..d10afeb 100644
 --- a/drivers/scsi/csiostor/csio_hw.c
 +++ b/drivers/scsi/csiostor/csio_hw.c
 @@ -4036,6 +4036,7 @@ csio_hw_init(struct csio_hw *hw)
   evt_entry = kzalloc(sizeof(struct csio_evt_msg), GFP_KERNEL);
   if (!evt_entry) {
   csio_err(hw, Failed to initialize eventq);
 + rv = -ENOMEM;
   goto err_evtq_cleanup;
   }
  
 

Acked-by: Naresh Kumar Inna nar...@chelsio.com

Thanks,
Naresh.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next] [SCSI] csiostor: remove unused variable in csio_process_fwevtq_entry()

2013-03-15 Thread Naresh Kumar Inna
On 3/15/2013 3:31 PM, Wei Yongjun wrote:
 From: Wei Yongjun yongjun_...@trendmicro.com.cn
 
 The variable 'data' is initialized but never used
 otherwise, so remove the unused variable.
 
 Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn
 ---
  drivers/scsi/csiostor/csio_hw.c | 5 -
  1 file changed, 5 deletions(-)
 
 diff --git a/drivers/scsi/csiostor/csio_hw.c b/drivers/scsi/csiostor/csio_hw.c
 index a0b4c89..1936055 100644
 --- a/drivers/scsi/csiostor/csio_hw.c
 +++ b/drivers/scsi/csiostor/csio_hw.c
 @@ -3565,7 +3565,6 @@ csio_process_fwevtq_entry(struct csio_hw *hw, void *wr, 
 uint32_t len,
 struct csio_fl_dma_buf *flb, void *priv)
  {
   __u8 op;
 - __be64 *data;
   void *msg = NULL;
   uint32_t msg_len = 0;
   bool msg_sg = 0;
 @@ -3581,8 +3580,6 @@ csio_process_fwevtq_entry(struct csio_hw *hw, void *wr, 
 uint32_t len,
   msg = (void *) flb;
   msg_len = flb-totlen;
   msg_sg = 1;
 -
 - data = (__be64 *) msg;
   } else if (op == CPL_FW6_MSG || op == CPL_FW4_MSG) {
  
   CSIO_INC_STATS(hw, n_cpl_fw6_msg);
 @@ -3590,8 +3587,6 @@ csio_process_fwevtq_entry(struct csio_hw *hw, void *wr, 
 uint32_t len,
   msg = (void *)((uintptr_t)wr + sizeof(__be64));
   msg_len = (op == CPL_FW6_MSG) ? sizeof(struct cpl_fw6_msg) :
  sizeof(struct cpl_fw4_msg);
 -
 - data = (__be64 *) msg;
   } else {
   csio_warn(hw, unexpected CPL %#x on FW event queue\n, op);
   CSIO_INC_STATS(hw, n_cpl_unexp);
 
 

Acked-by: Naresh Kumar Inna nar...@chelsio.com

Thanks,
Naresh.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Hannes Reinecke

On 03/14/2013 07:09 PM, Steffen Maier wrote:

Just for my understanding:
So this is a bit like writing 0 into the failed attribute of a
zfcp_port in sysfs (zfcp_sysfs_port_failed_store()) which forces a
port reopen recovery including a sequence of fc_remote_port_delete
and fc_remote_port_add?

Sorta.
This patch simulates the result of an RSCN where this remote port is 
missing. So the fast_io_fail / dev_loss_tmo mechanism is triggered 
(until further notice).
It'll be reset with the next RSCN or by manually resetting the port 
state.



If so, it sounds good to have this functionality for any FC LLD in
common code.


That's why I posted this code :-)
Rationale for this patch is a weird test case with brocade switches;
there you can actually disable a _target_ port. So the port isn't
reachable anymore but no RSCN is send.
And the LLDD is forced into error recovery which'll take _ages_ as 
each and every command send during error recovery will time out.



(And I can think about deprecating parts of zfcp code, next we could
consider the same for zfcp's forced LUN and adapter recovery or
maybe this already exists as sdev's writable state attribute and the
adapter recovery can be performed manually by walking all fc_rports
writing their port_state.)

I don't think the sdev 'state' attribute is useable here; it doesn't 
trigger a rediscovery. You'd want to use 'rescan' here.

But yeah, that would be one of the goals.

Still waiting for some response to the patch, though.
Testing has been successful on our side, plus I've posted
a patchset for multipath-tools utilizing this patch.

So it would be good if it could be included.

James?

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Bryn M. Reeves

On 03/15/2013 11:55 AM, Hannes Reinecke wrote:

Rationale for this patch is a weird test case with brocade switches;
there you can actually disable a _target_ port. So the port isn't
reachable anymore but no RSCN is send.


I think it's more than a pure test-case; using the rscnsupr feature on 
the Brocades is just a handy way to trigger it.


I've seen numerous cases in the last few years of fabric failures that 
had this characteristic - either because of hardware failures in the 
switches or due to bugs that caused FC traffic to be blackholed without 
an RSCN or other indication (beyond commands timing out).


This (and the I_T nexus reset you proposed in December which is very 
effective at short-ciruiting the eh escalation in the same situation) 
both make the kernel more robust in the face of this kind of problem.


Regards,
Bryn.


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Bart Van Assche

On 03/15/13 12:55, Hannes Reinecke wrote:

And the LLDD is forced into error recovery which'll take _ages_ as each
and every command send during error recovery will time out.


Hello Hannes,

I'm analyzing a related but not identical issue with SRP. It would help 
if you could tell with which LLDD you ran into this issue and with which 
values of fast_io_fail_tmo and dev_loss_tmo.


Bart.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Bryn M. Reeves

On 03/15/2013 12:24 PM, Bart Van Assche wrote:

On 03/15/13 12:55, Hannes Reinecke wrote:

And the LLDD is forced into error recovery which'll take _ages_ as each
and every command send during error recovery will time out.


Hello Hannes,

I'm analyzing a related but not identical issue with SRP. It would help
if you could tell with which LLDD you ran into this issue and with which
values of fast_io_fail_tmo and dev_loss_tmo.


Most of the cases I've seen have involved lpfc (although I don't think 
it's in any way exclusive to that LLDD). Even with very low 
fast_io_fail_timeout/dev_loss_timeout (5/10) the eh is busy for 10m or 
longer before IO fails and multipath is able to react to the problem.


Regards,
Bryn.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Bart Van Assche

On 03/15/13 13:37, Bryn M. Reeves wrote:

On 03/15/2013 12:24 PM, Bart Van Assche wrote:

On 03/15/13 12:55, Hannes Reinecke wrote:

And the LLDD is forced into error recovery which'll take _ages_ as each
and every command send during error recovery will time out.


Hello Hannes,

I'm analyzing a related but not identical issue with SRP. It would help
if you could tell with which LLDD you ran into this issue and with which
values of fast_io_fail_tmo and dev_loss_tmo.


Most of the cases I've seen have involved lpfc (although I don't think
it's in any way exclusive to that LLDD). Even with very low
fast_io_fail_timeout/dev_loss_timeout (5/10) the eh is busy for 10m or
longer before IO fails and multipath is able to react to the problem.


The SCSI EH keeps trying until all outstanding request have been 
finished. Does lpfc_host_reset_handler() invoke scsi_done() for 
outstanding requests ? If not, how about modifying 
lpfc_host_reset_handler() such that it finishes all outstanding requests 
if the remote port is not reachable ?


Bart.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] st: Take additional queue ref in st_probe

2013-03-15 Thread Ewan Milne
On Tue, 2013-03-05 at 10:57 -0500, Joe Lawrence wrote:
 Changes from v1:
   Corrected error paths as noted by Ewan Milne and Jan Vesely.

Acked-by: Ewan D. Milne emi...@redhat.com

 
 These changes were applied to scsi.git, branch misc.  This patch
 fixes a reference count bug in the SCSI tape driver which can be
 reproduced with the following:
 
 * Boot with slub_debug=FZPU, tape drive attached
 * echo 1  /sys/devices/... tape device pci path .../remove
 * Wait for device removal
 * echo 1  /sys/kernel/slab/blkdev_queue/validate
 * Slub debug complains about corrupted poison pattern
 
 In commit 523e1d39 (block: make gendisk hold a reference to its queue) 
 add_disk() and disk_release() were modified to get/put an additional
 reference on a disk queue to fix a reference counting discrepency
 between bdev release and SCSI device removal.  The ST driver never
 calls add_disk(), so this commit introduced an extra kref put when the
 ST driver frees its struct gendisk.
 
 Attempts were made to fix this bug at the block level [1] but later
 abandoned due to floppy driver issues [2].
 
 [1] https://lkml.org/lkml/2012/8/27/354
 [2] https://lkml.org/lkml/2012/9/22/113
 
 From a50a6ee28748b7c1620af6f76772164ec0fc4a1d Mon Sep 17 00:00:00 2001
 From: Joe Lawrence joe.lawre...@stratus.com
 Date: Tue, 5 Mar 2013 09:30:14 -0500
 Subject: [PATCH v2] st: Take additional queue ref in st_probe
 MIME-Version: 1.0
 Content-Type: text/plain; charset=UTF-8
 Content-Transfer-Encoding: 8bit
 
 The SCSI tape driver employs a struct gendisk, calling alloc_disk() to
 create an instance, but does not register it via add_disk().  When the
 gendisk is torn down, disk_release() is called and expects to return a
 disk queue reference that add_disk() normally would have taken out. (See
 commit 523e1d39.)  Fix the kref accounting by adding a blk_get_queue()
 to st_probe().
 
 Signed-off-by: Joe Lawrence joe.lawre...@stratus.com
 Tested-by: Ewan D. Milne emi...@redhat.com 
 Cc: Kai Mäkisara kai.makis...@kolumbus.fi
 Cc: James E.J. Bottomley jbottom...@parallels.com
 Cc: Seymour, Shane M shane.seym...@hp.com
 Cc: Jan Vesely jves...@redhat.com
 Cc: linux-scsi@vger.kernel.org
 ---
  drivers/scsi/st.c | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
 index 98156a9..96f3363 100644
 --- a/drivers/scsi/st.c
 +++ b/drivers/scsi/st.c
 @@ -4112,6 +4112,10 @@ static int st_probe(struct device *dev)
   tpnt-disk = disk;
   disk-private_data = tpnt-driver;
   disk-queue = SDp-request_queue;
 + /* SCSI tape doesn't register this gendisk via add_disk().  Manually
 +  * take queue reference that release_disk() expects. */
 + if (!blk_get_queue(disk-queue))
 + goto out_put_disk;
   tpnt-driver = st_template;
  
   tpnt-device = SDp;
 @@ -4181,7 +4185,7 @@ static int st_probe(struct device *dev)
   if (!idr_pre_get(st_index_idr, GFP_KERNEL)) {
   pr_warn(st: idr expansion failed\n);
   error = -ENOMEM;
 - goto out_put_disk;
 + goto out_put_queue;
   }
  
   spin_lock(st_index_lock);
 @@ -4189,7 +4193,7 @@ static int st_probe(struct device *dev)
   spin_unlock(st_index_lock);
   if (error) {
   pr_warn(st: idr allocation failed: %d\n, error);
 - goto out_put_disk;
 + goto out_put_queue;
   }
  
   if (dev_num  ST_MAX_TAPES) {
 @@ -4222,6 +4226,8 @@ out_put_index:
   spin_lock(st_index_lock);
   idr_remove(st_index_idr, dev_num);
   spin_unlock(st_index_lock);
 +out_put_queue:
 + blk_put_queue(disk-queue);
  out_put_disk:
   put_disk(disk);
   kfree(tpnt);


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Bryn M. Reeves

On 03/15/2013 12:46 PM, Bart Van Assche wrote:

The SCSI EH keeps trying until all outstanding request have been
finished. Does lpfc_host_reset_handler() invoke scsi_done() for


I don't think so (ends up calling lpfc_sli_cancel_iocbs() via 
lpfc_hba_down_post() after shutting down the mailbox) but I've not seen 
the EH escalate all the way to host reset in most of my testing - 
usually some time after reaching the bus reset remaining IOs timeout and 
the error bubbles up to device-mapper (all the cases I'm looking at are 
devices managed by a dm-multipath target).


The problem is that getting to this stage can take a very long time - 
much longer than most cluster's node eviction timer for e.g. which is 
the source of much of the complaint about this behaviour.



outstanding requests ? If not, how about modifying
lpfc_host_reset_handler() such that it finishes all outstanding requests
if the remote port is not reachable ?


I'm not sure how safe that is in this situation - James mentioned in the 
I_T nexus reset thread concerns about frames that could be delayed etc. 
in the fabric if the host unilaterally abandons IOs (not sure of the 
details for lpfc at this level).


Regards,
Bryn.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Bart Van Assche

On 03/15/13 14:28, Bryn M. Reeves wrote:

On 03/15/2013 12:46 PM, Bart Van Assche wrote:

The SCSI EH keeps trying until all outstanding request have been
finished. Does lpfc_host_reset_handler() invoke scsi_done() for


I don't think so (ends up calling lpfc_sli_cancel_iocbs() via
lpfc_hba_down_post() after shutting down the mailbox) but I've not seen
the EH escalate all the way to host reset in most of my testing -
usually some time after reaching the bus reset remaining IOs timeout and
the error bubbles up to device-mapper (all the cases I'm looking at are
devices managed by a dm-multipath target).

The problem is that getting to this stage can take a very long time -
much longer than most cluster's node eviction timer for e.g. which is
the source of much of the complaint about this behaviour.


outstanding requests ? If not, how about modifying
lpfc_host_reset_handler() such that it finishes all outstanding requests
if the remote port is not reachable ?


I'm not sure how safe that is in this situation - James mentioned in the
I_T nexus reset thread concerns about frames that could be delayed etc.
in the fabric if the host unilaterally abandons IOs (not sure of the
details for lpfc at this level).


How about using the value of scsi_cmnd.jiffies_at_alloc to finish only 
those SCSI commands in the host reset handler that exceeded a certain 
processing time ?


Bart.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] scsi: Use W_LUN for scanning

2013-03-15 Thread Steffen Maier
While we're at it: I recently figured that there are targets responding 
to inquiry with PQ=1  PDT=31 for LUN0 if LUN0 has no backing device 
(e.g. no disk mapped for the initiator host). While this is likely to 
work with in-kernel lun scanning, the kernel does not even allocate an 
sg dev in this case.
If the target (such do exist) now also does not support report_luns on 
WLUN, this effectively prevents any user space lun discovery.
E.g. non-NPIV zfcp-attached LUNs cannot be discovered, because we cannot 
do in-kernel scanning due to the lack of initiator lun masking.


The reason for Linux ignoring LUNs with PDT=31 and PQ=1 is:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=84961f28e9d13a4b193d0c8545f3c060c1890ff3
[SCSI] Don't add scsi_device for devices that return PQ=1, PDT=0x1f
from kernel 2.6.19.
Since Linux on System z could no longer perform report luns with IBM 
DS8000, the following workaround was implemented:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=01b291bd66564b4bd826326af6bd0b6d17e99439
[SCSI] fix check of PQ and PDT bits for WLUNs
in kernel 2.6.27.
However, this only works for WLUNs reporting PDT=31 and PQ=1 but not for 
LUN0.
What made it worse is that, the attached LUN looks perfect from a zfcp 
status point of view but is still missing any user visible handle for 
the scsi midlayer, so I was puzzled as a user despite the otherwise 
clear scsi log message:

scsi scan: peripheral device type of 31, ***no device added***.

On 03/15/2013 10:46 AM, Hannes Reinecke wrote:

SAM advertises the use of a Well-known LUN (W_LUN) for scanning.
As this avoids exposing LUN 0 (which might be a valid LUN) for
all initiators it is the preferred method for LUN scanning on
some arrays.
So we should be using W_LUN for scanning, too. If the W_LUN is
not supported we'll fall back to use LUN 0.
For broken W_LUN implementations a new blacklist flag
'BLIST_NO_WLUN' is added.

Signed-off-by: Hannes Reinecke h...@suse.de

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 3e58b22..f4ccdea 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1312,6 +1312,7 @@ static int scsi_report_lun_scan(struct scsi_target 
*starget, int bflags,
unsigned int num_luns;
unsigned int retries;
int result;
+   int w_lun = SCSI_W_LUN_REPORT_LUNS;
struct scsi_lun *lunp, *lun_data;
u8 *data;
struct scsi_sense_hdr sshdr;
@@ -1337,11 +1338,20 @@ static int scsi_report_lun_scan(struct scsi_target 
*starget, int bflags,
return 0;
if (starget-no_report_luns)
return 1;
+   if (bflags  BLIST_NO_WLUN)
+   w_lun = 0;

+retry_report_lun_scan:
if (!(sdev = scsi_device_lookup_by_target(starget, 0))) {
-   sdev = scsi_alloc_sdev(starget, 0, NULL);
-   if (!sdev)
-   return 0;
+   sdev = scsi_alloc_sdev(starget, w_lun, NULL);
+   if (!sdev) {
+   if (w_lun != 0) {
+   w_lun = 0;
+   sdev = scsi_alloc_sdev(starget, w_lun, NULL);
+   }



+   if (!sdev)
+   return 0;


Semantically this belongs to the 2nd scsi_alloc_sdev(starget, w_lun, 
NULL) i.e. inside (at the end of) the body of if (w_lun != 0).

Then you can even merge the outer and inner if statement to
if (!sdev  wlun != 0)
Semantics: WLUN did not work and we haven't already tried LUN0.
Maybe it's just me but I found it more difficult to read if the 2nd 
check is on its own inside the outer if statement with exactly the same 
boolean expression.




+   }
if (scsi_device_get(sdev)) {
__scsi_remove_device(sdev);
return 0;


Now that we not only use LUN0, it would be nice to reflect the currently 
used LUN in the corresponding scsi logging statements for debugging and 
problem determination purposes.
It seems as if devname only contains HCT but not L, which can now be 
either WLUN or LUN0, e.g. here:



for (retries = 0; retries  3; retries++) {
SCSI_LOG_SCAN_BUS(3, printk (KERN_INFO scsi scan: Sending
 REPORT LUNS to %s (try %d)\n, devname,
retries));

result = scsi_execute_req(sdev, scsi_cmd, DMA_FROM_DEVICE,
  lun_data, length, sshdr,
  SCSI_TIMEOUT + 4 * HZ, 3, NULL);

SCSI_LOG_SCAN_BUS(3, printk (KERN_INFO scsi scan: REPORT LUNS
 %s (try %d) result 0x%x\n, result
?  failed : successful, retries, result));



@@ -1418,6 +1428,18 @@ static int scsi_report_lun_scan(struct scsi_target 
*starget, int bflags,
}

if (result) {
+ 

Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Mike Christie
On 03/15/2013 08:41 AM, Bart Van Assche wrote:
 On 03/15/13 14:28, Bryn M. Reeves wrote:
 On 03/15/2013 12:46 PM, Bart Van Assche wrote:
 The SCSI EH keeps trying until all outstanding request have been
 finished. Does lpfc_host_reset_handler() invoke scsi_done() for


It does not really matter at that point for the scsi command timeout
case. When blk_complete_request is called by scsi_done, it will see the
command has timed out and so it will not be processed by that normal
completion path. The scsi eh basically owns the completion processing of
the command once it has timed out.

For the cleanup when the port is not reachable you had suggested is what
Hannes is proposing in the I_T nexus reset patch. The problem is making
sure that if the driver/class returns SUCCESS or FAST_IO_FAIL then they
do not touch the scsi cmnd struct again.


 I don't think so (ends up calling lpfc_sli_cancel_iocbs() via
 lpfc_hba_down_post() after shutting down the mailbox) but I've not seen
 the EH escalate all the way to host reset in most of my testing -
 usually some time after reaching the bus reset remaining IOs timeout and
 the error bubbles up to device-mapper (all the cases I'm looking at are
 devices managed by a dm-multipath target).

 The problem is that getting to this stage can take a very long time -
 much longer than most cluster's node eviction timer for e.g. which is
 the source of much of the complaint about this behaviour.

 outstanding requests ? If not, how about modifying
 lpfc_host_reset_handler() such that it finishes all outstanding requests
 if the remote port is not reachable ?

 I'm not sure how safe that is in this situation - James mentioned in the
 I_T nexus reset thread concerns about frames that could be delayed etc.
 in the fabric if the host unilaterally abandons IOs (not sure of the
 details for lpfc at this level).
 
 How about using the value of scsi_cmnd.jiffies_at_alloc to finish only
 those SCSI commands in the host reset handler that exceeded a certain
 processing time ?
 

We basically do this now. When a scsi command times out the scsi layer
blocks the host from processing new commands and waits for all
outstanding commands to either finish normally or timeout. When all
commands have finished or timedout, then we start the scsi eh code. So,
by the time we have go to the scsi eh callbacks we are in a state where
all the commands being processed by the eh have exceeded a certain
processing time.

If you mean you want to drop the block and wait part, then I think it
could speed things up to do the abort callbacks while other IO is
running (as long as the driver can support it). However if the abort
fails and you need to escalate to operations like resets which interfere
with multiple commands, then the driver/scsi-ml does not have much
choice in what it does cleanup wise. There would be no point in checking
the jiffies_at_alloc. The commands that are going to be affected by the
tmf or host reset operation must be returned to the scsi-ml for retries
or failure upwards.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Bart Van Assche

On 03/15/13 19:51, Mike Christie wrote:

On 03/15/2013 08:41 AM, Bart Van Assche wrote:

How about using the value of scsi_cmnd.jiffies_at_alloc to finish only
those SCSI commands in the host reset handler that exceeded a certain
processing time ?


We basically do this now. When a scsi command times out the scsi layer
blocks the host from processing new commands and waits for all
outstanding commands to either finish normally or timeout. When all
commands have finished or timedout, then we start the scsi eh code. So,
by the time we have go to the scsi eh callbacks we are in a state where
all the commands being processed by the eh have exceeded a certain
processing time.

If you mean you want to drop the block and wait part, then I think it
could speed things up to do the abort callbacks while other IO is
running (as long as the driver can support it). However if the abort
fails and you need to escalate to operations like resets which interfere
with multiple commands, then the driver/scsi-ml does not have much
choice in what it does cleanup wise. There would be no point in checking
the jiffies_at_alloc. The commands that are going to be affected by the
tmf or host reset operation must be returned to the scsi-ml for retries
or failure upwards.


Hello Mike,

It seems like there is a misunderstanding. With my comment I was not 
referring to the SCSI ML but to the SCSI LLD. LLD drivers like ib_srp 
keep track of outstanding SCSI requests. With the SRP protocol it is 
possible to tell the InfiniBand HCA not to deliver completions for 
outstanding requests by closing the connection used for SRP 
communication. Hence my suggestion to finish SCSI commands that were 
queued longer than a certain time ago from inside the LLD host reset 
handler. I'm not sure though whether all types of FC HBA's allow 
something equivalent.


Bart.


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Using sg_ses to change cooling element rpm speed

2013-03-15 Thread Jeff Johnson

Greetings,

I'm beating my head against the desk trying to use sg_ses to change the 
cooling control element requested speed code for a fan connected to an 
LSI SAS2X36 SAS expander.


I have tried the raw-edit-raw method and tried the 
--data/--index/--control method and neither seems to work.


Polling status for the cooling element (--index=26) gets me:

Element 26 descriptor:
Predicted failure=0, Disabled=0, Swap=0, status: OK
Ident=0, Hot swap=0, Fail=0, Requested on=1, Off=0
Actual speed=4920 rpm, Fan at lowest speed

Issuing the following command doesn't change the rpm speed:
'sg_ses --page=0x2 /dev/sg1 --index=26 --control --data=70,00,00,27'

Per the SES spec that string should select Fan at highest speed.

I have also tried taking the raw output, editing the bytes for the 
element and sending it back and it also does nothing. The sg_ses command 
succeeds but there are no intended results.


Am I using the command syntax correctly? I want to be sure before I go 
after the vendor for improper support of the SES specification.



Thanks!

--Jeff

--
--
Jeff Johnson
Co-Founder

Aeon Computing, Inc
High-performance computing / Lustre filesystems

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845

4170 Morena Boulevard, Suite D - San Diego, CA 92117

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Mike Christie
On 03/15/2013 02:13 PM, Bart Van Assche wrote:
 On 03/15/13 19:51, Mike Christie wrote:
 On 03/15/2013 08:41 AM, Bart Van Assche wrote:
 How about using the value of scsi_cmnd.jiffies_at_alloc to finish only
 those SCSI commands in the host reset handler that exceeded a certain
 processing time ?

 We basically do this now. When a scsi command times out the scsi layer
 blocks the host from processing new commands and waits for all
 outstanding commands to either finish normally or timeout. When all
 commands have finished or timedout, then we start the scsi eh code. So,
 by the time we have go to the scsi eh callbacks we are in a state where
 all the commands being processed by the eh have exceeded a certain
 processing time.

 If you mean you want to drop the block and wait part, then I think it
 could speed things up to do the abort callbacks while other IO is
 running (as long as the driver can support it). However if the abort
 fails and you need to escalate to operations like resets which interfere
 with multiple commands, then the driver/scsi-ml does not have much
 choice in what it does cleanup wise. There would be no point in checking
 the jiffies_at_alloc. The commands that are going to be affected by the
 tmf or host reset operation must be returned to the scsi-ml for retries
 or failure upwards.
 
 Hello Mike,
 
 It seems like there is a misunderstanding. With my comment I was not
 referring to the SCSI ML but to the SCSI LLD. LLD drivers like ib_srp
 keep track of outstanding SCSI requests. With the SRP protocol it is
 possible to tell the InfiniBand HCA not to deliver completions for
 outstanding requests by closing the connection used for SRP
 communication. Hence my suggestion to finish SCSI commands that were
 queued longer than a certain time ago from inside the LLD host reset

Are you saying you would have a class/driver timer running that
determines when to run this? What you describe sounds like it would
help. I am just wondering when it would be run.

If I understand you correctly, is iscsi doing what you describe when we
escalate to session dropping and relogin?


 handler. I'm not sure though whether all types of FC HBA's allow
 something equivalent.
 
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC] scsi: Use W_LUN for scanning

2013-03-15 Thread Douglas Gilbert

On 13-03-15 05:46 AM, Hannes Reinecke wrote:

SAM advertises the use of a Well-known LUN (W_LUN) for scanning.
As this avoids exposing LUN 0 (which might be a valid LUN) for
all initiators it is the preferred method for LUN scanning on
some arrays.
So we should be using W_LUN for scanning, too. If the W_LUN is
not supported we'll fall back to use LUN 0.
For broken W_LUN implementations a new blacklist flag
'BLIST_NO_WLUN' is added.


There are proposals at T10 for feature sets to be added to
SCSI (similar to what ATA devices have). Perhaps we could
have something similar at the OS level: an opaque call
that makes some general decisions based on what we know
about the transport, HBA and possibly target/LU. So if
we are looking at a USB transport not doing UASP, then
prefer to probe LUN 0 rather than the to probe via
REPORT LUNS W_LUN. Many other (somewhat) advanced SCSI
techniques could be filtered in a similar way (e.g. if
it's a USB device assume badly implemented SCSI-2
compliance).

We could still keep the blacklist and, if we don't already
have it, add whitelist logic (e.g. for 0.001% of well-behaved
USB devices).

Doug Gilbert


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html