date:20180720

[PATCH v2] scsi_debug: add cmd abort option to every_nth

2018-07-20 Thread Douglas Gilbert

This patch is motivated by a response in the thread:
  Re: [PATCH 0/5]stop normal completion path entering a timeout req
by Jianchao Wang . It generalizes the error injection of
blk_abort_request() to use scsi_debug's "every_nth" mechanism.
Ref with original patch to scsi_debug:
  https://lore.kernel.org/lkml/a68ad043-26a1-d3d8-2009-504ba4230...@oracle.com/

Also convert two vmalloc/memset(0) to vzalloc() calls.

Signed-off-by: Douglas Gilbert 
---

Changes since original version:
  - thanks to changes/fixes suggested by Bart Van Assche, mainly
to avoid calling scsi_done() in worker in the abort case,
this version works

The scsi mid-level call scsi_debug_abort but at the time this driver
has already removed the command context so the abort is ignored.

One small wrinkle:
...
sd 0:0:0:0: scsi_debug: tag=0x4d, cmd 28 00 00 00 27 c0 00 00 10 00 
scsi_debug: tag=0xa7, cmd 28 00 00 00 27 d0 00 00 10 00 
sd 0:0:0:0: abort request tag 167
scsi_debug:sdebug_q_cmd_complete: bypassing scsi_done() due to aborted cmd
scsi_debug_abort: command not found
scsi_debug: tag=0xa7, cmd 28 00 00 00 27 e0 00 00 10 00 
scsi_debug: tag=0xaa, cmd 28 00 00 00 27 f0 00 00 10 00
...

My ddpt utility is sending those READs to /dev/sg0 using
ioctl(SG_IO_v3) and is not informed that the READ on LBA
0x27d0 has failed.

The patch is against MKP's 4.19/scsi-queue branch.

 drivers/scsi/scsi_debug.c | 47 ++-
 1 file changed, 32 insertions(+), 15 deletions(-)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 4dda192b8fba..2c816ecb0a92 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -164,29 +164,29 @@ static const char *sdebug_version_date = "20180128";
 #define SDEBUG_OPT_RESET_NOISE 0x2000
 #define SDEBUG_OPT_NO_CDB_NOISE0x4000
 #define SDEBUG_OPT_HOST_BUSY   0x8000
+#define SDEBUG_OPT_CMD_ABORT   0x1
 #define SDEBUG_OPT_ALL_NOISE (SDEBUG_OPT_NOISE | SDEBUG_OPT_Q_NOISE | \
  SDEBUG_OPT_RESET_NOISE)
 #define SDEBUG_OPT_ALL_INJECTING (SDEBUG_OPT_RECOVERED_ERR | \
  SDEBUG_OPT_TRANSPORT_ERR | \
  SDEBUG_OPT_DIF_ERR | SDEBUG_OPT_DIX_ERR | \
  SDEBUG_OPT_SHORT_TRANSFER | \
- SDEBUG_OPT_HOST_BUSY)
+ SDEBUG_OPT_HOST_BUSY | \
+ SDEBUG_OPT_CMD_ABORT)
 /* When "every_nth" > 0 then modulo "every_nth" commands:
  *   - a missing response is simulated if SDEBUG_OPT_TIMEOUT is set
  *   - a RECOVERED_ERROR is simulated on successful read and write
  * commands if SDEBUG_OPT_RECOVERED_ERR is set.
  *   - a TRANSPORT_ERROR is simulated on successful read and write
  * commands if SDEBUG_OPT_TRANSPORT_ERR is set.
+ *   - similarly for DIF_ERR, DIX_ERR, SHORT_TRANSFER, HOST_BUSY and
+ * CMD_ABORT
  *
- * When "every_nth" < 0 then after "- every_nth" commands:
- *   - a missing response is simulated if SDEBUG_OPT_TIMEOUT is set
- *   - a RECOVERED_ERROR is simulated on successful read and write
- * commands if SDEBUG_OPT_RECOVERED_ERR is set.
- *   - a TRANSPORT_ERROR is simulated on successful read and write
- * commands if _DEBUG_OPT_TRANSPORT_ERR is set.
- * This will continue on every subsequent command until some other action
- * occurs (e.g. the user * writing a new value (other than -1 or 1) to
- * every_nth via sysfs).
+ * When "every_nth" < 0 then after "- every_nth" commands the selected
+ * error will be injected. The error will be injected on every subsequent
+ * command until some other action occurs; for example, the user writing
+ * a new value (other than -1 or 1) to every_nth:
+ *  echo 0 > /sys/bus/pseudo/drivers/scsi_debug/every_nth
  */
 
 /* As indicated in SAM-5 and SPC-4 Unit Attentions (UAs) are returned in
@@ -281,6 +281,7 @@ struct sdebug_defer {
int issuing_cpu;
bool init_hrt;
bool init_wq;
+   bool aborted;   /* true when blk_abort_request() already called */
enum sdeb_defer_type defer_t;
 };
 
@@ -296,6 +297,7 @@ struct sdebug_queued_cmd {
unsigned int inj_dix:1;
unsigned int inj_short:1;
unsigned int inj_host_busy:1;
+   unsigned int inj_cmd_abort:1;
 };
 
 struct sdebug_queue {
@@ -3792,6 +3794,7 @@ static struct sdebug_queue *get_queue(struct scsi_cmnd 
*cmnd)
 /* Queued (deferred) command completions converge here. */
 static void sdebug_q_cmd_complete(struct sdebug_defer *sd_dp)
 {
+   bool aborted = sd_dp->aborted;
int qc_idx;
int retiring = 0;
unsigned long iflags;
@@ -3801,6 +3804,8 @@ static void sdebug_q_cmd_complete(struct sdebug_defer 
*sd_dp)
struct sdebug_dev_info *devip;
 
sd_dp->defer_t = SDEB_DEFER_NONE;
+   if (unlikely(aborted))
+   sd_dp->aborted = false;
qc_idx = sd_dp->qc_idx;
sqp = sdebug_q_arr

Re: [PATCH] tcmu: fix crash for dereferencing the released udev->mb_addr memory

2018-07-20 Thread Xiubo Li


On 2018/7/20 23:13, Mike Christie wrote:

On 07/19/2018 07:34 PM, Xiubo Li wrote:

On 2018/7/19 23:49, Mike Christie wrote:

On 07/19/2018 09:30 AM, xiu...@redhat.com wrote:

From: Xiubo Li 

The logs are:

BUG: unable to handle kernel NULL pointer dereference at 0040
IP: [] tcmu_reset_ring_store+0x149/0x240 [target_core_user]
PGD 8e254067 PUD e255067 PMD 0
Oops: 0002 [#1] SMP
[...]
CPU: 0 PID: 36077 Comm: tcmu-runner Kdump: loaded Not tainted 
3.10.0-924.el7.test.x86_64 #1

It is not clear to me how you hit this. It's not a RHEL only bug is it?
Are you sure you are hitting it during device removal?

Yes, not only the RHEL bug, for the RHEL we have sync the related code
to the upstream. Before as I remembered I have hit one sudden crash by
using the SCSI repo code when testing the tcmu-runner PR#402:
tcmu-runner: add reset netlink support.

The SCSI repo is :
git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git

Since the kdump service was not working for the upstream code in my
setups that time, so could only get very simple logs. This time I was
using Prasanna's script and RHEL code with the netlink and UIO crash
fixing patches to reproduce this. The test script please see the attachment.


If we are calling tcmu_reset_ring_store isn't there a refcount on the
item? So we cannot call tcmu_destroy_device -> tcmu_dev_kref_release
until after that has been released, so they would never run at the same
time. And, if userspace were to try to open/write to that reset file
after the rmdir has been done on the se_device then configfs detects
this and fails the open.

With the tcmu-runner PR#402, when starting tcmu-runner the daemon will do :
1, block netlink
2, reset netlink events
3, block device --> reset ring --> open device --> unblock device

If the device was only created and not yet enabled you will hit the
crash here.

If in your test you kill runner while a enable is in process then this
would happen.


As you mentioned it is possible only created then runner is killed.


4, unblock netlink


Currently there may has one bug in systemd service, because we hit one
odd issue by using the test script:
When executing 'systemctl restart tcmu-runner', this process seems
waiting one signal from somewhere by stucking itself, if we do nothing
It will wait for about 6 ~20 minutes(the time comes from our test
result) to recovery from that. And we can execute 'systemctl restart
gluster-blockd' in another terminal, which could stop the stuck state
above immediately.

[root@localhost ~]# ps -aux | grep -e tcmu -e
target
root  2021  0.0  0.0  0 0 ?S<   15:09   0:00
[target_completi]
root  2386  0.6  0.5 226232 11096 ?Ds   15:11   0:00
/usr/bin/python /usr/bin/targetctl clear
root  2400  0.0  0.0 134816  1316 pts/0S+   15:11   0:00
systemctl restart tcmu-runnerroot  2415
0.0  0.0 112704   988 pts/1R+   15:11   0:00 grep --color=auto -e
tcmu -e target
[root@localhost ~]# cat
/proc/2400/stack

[]
poll_schedule_timeout+0x55/0xb0

[]
do_sys_poll+0x48d/0x590

[]
SyS_ppoll+0x1d3/0x1f0

[]
system_call_fastpath+0x1c/0x21


And with IOs going on, when killing the tcmu-runner process, the

Are you just doing a test to kill runner at various times or is your
restart of your service doing this while the targetcli clear operation
is running?

There are 3 related services, and here '-->' means will Bindto and depend to
gluster-blockd.service --> gluster-block-target.service --> tcmu-runner 
service


I was running the script as I have attached in last mail and in another 
terminal running the 'systemctl restart gluster-blockd' cmd when this 
crash is hit. Because as I mentioned without the 'systemctl restart 
gluster-blockd' in another terminal, the script will be stuck in 
'systemctl restart tcmu-runner' for a very long time.


When resarting the gluster-blockd, it will start the tcmu-runner and 
gluster-block-target first.


And it is possible that the targetcli clear operation is stuck somewhere 
then the runner is restarted.



If you re killing it at various times, how do you know you are not
killing it during a enable operation and hitting this bug there?

It is very possible.



If you are killing on purpose as part of your restart operation why are
you killing it when the targetcli clear operation needs the backends to
complete their IO as part of the tpg disablement process.


Just after runner service is killed then the gluster-block-target 
service will be killed too, and it will invoke the 'targetctl clear' 
when stopping.


We are testing the netlink reset and the ring reset feature, so the 
script is killing the runner daemon to emulate the D state cases. And 
the script also emulating the crash case of the tcmu-runner daemon for 
some reasons.





'targetctl clear' process will stuck in [1] first. Then we restart the
tcmu-runner, which will reset ring with accessing the udev->mb_addr,
currently we still could access the
/

Re: [PATCH] scsi_debug: add cmd abort option to every_nth

2018-07-20 Thread Bart Van Assche

On Fri, 2018-07-20 at 15:21 -0400, Douglas Gilbert wrote:
>  /* Complete the processing of the thread that queued a SCSI command to this
> @@ -4459,6 +4462,11 @@ static int schedule_resp(struct scsi_cmnd *cmnd, 
> struct sdebug_dev_info *devip,
>   sd_dp->issuing_cpu = raw_smp_processor_id();
>   sd_dp->defer_t = SDEB_DEFER_WQ;
>   schedule_work(&sd_dp->ew.work);
> + if (unlikely(sqcp->inj_cmd_abort)) {
> + blk_abort_request(cmnd->request);
> + sdev_printk(KERN_INFO, sdp, "abort request tag %d\n",
> + cmnd->request->tag);
> + }
>   }
>   if (unlikely((SDEBUG_OPT_Q_NOISE & sdebug_opts) &&
>(scsi_result == device_qfull_result)))

Should the sdev_printk() call occur before the blk_abort_request() call to
avoid that the sdev_printk() call triggers a use-after-free?

Does the above change cause schedule_resp() to call both blk_abort_request()
and scsi_done()? I think that's wrong. A SCSI driver should call one of
these two functions but not both.

Thanks,

Bart.

Re: [PATCH 02/15] target: fix isid copying and comparision

2018-07-20 Thread Bart Van Assche

On Fri, 2018-07-20 at 16:08 -0500, Mike Christie wrote:
> Hey Bart and Christoph,
> 
> Bart, I noticed we basically had what you are requesting but Christoph
> had moved the id code from the fabric drivers to lio core in this commit:
> 
> commit 2650d71e244fb3637b5f58a0080682a8bf9c7091
> Author: Christoph Hellwig 
> Date: Fri May 1 17:47:58 2015 +0200
> 
> target: move transport ID handling to the core

Hello Mike,

I'm not in favor of reverting Christoph's patch because that patch simplified
the target code significantly. On the other hand, it's inconvenient that with
the current approach there is some code and knowledge in the target core that
should be in target drivers. I think that the code for parsing the initiator
name in srp_get_pr_transport_id() should be in the SRP target driver instead
of the core. When I added support for a new initiator name format in the SRP
target driver I overlooked that I had to update srp_get_pr_transport_id()
because that function is in the core instead of ib_srpt.c. See also commit
2dc98f09f9e6 ("IB/srpt: Use the source GUID as session name"). Christoph, do
you want me to add support for the new ib_srpt initiator name format in 
drivers/target/target_core_fabric_lib.c or should I find a way to move the
code for parsing the initiator name into ib_srpt.c?

Thanks,

Bart.

Re: [PATCH 02/15] target: fix isid copying and comparision

2018-07-20 Thread Mike Christie

On 07/20/2018 04:08 PM, Mike Christie wrote:
> On 07/19/2018 11:13 AM, Mike Christie wrote:
>> > On 07/19/2018 10:15 AM, Bart Van Assche wrote:
>>> >> On Wed, 2018-07-18 at 20:02 -0500, Mike Christie wrote:
 >>> On 07/18/2018 07:03 PM, Mike Christie wrote:
>  On 07/18/2018 05:09 PM, Bart Van Assche wrote:
>> > [ ... ]
>> > is that these involve a transport ID and that that transport ID 
>> > can be up to 228
>> > bytes long for iSCSI.
> 
>  I am talking about the Initiator Session ID above. That along with 
>  the
>  iscsi name make up the Initiator Port Transport ID. In spc4r37 
>  checkout
>  table 508 or in SAM 5r21 checkout table A.4.
> 
>  So in the SCSI specs as part of the Initiator Port Transport ID we 
>  have
>  this from that SAM table:
> 
>  The Initiator Session Identifier (ISID) portion of the string is a 
>  UTF-8
>  encoded hexadecimal representation of a six byte binary value.
> 
>  ---
> 
>  In the PR parts of SPC it sometimes mentions only "Transport ID" but
>  then clarifies the initiator port so I am assuming in those cases it
>  means "Initiator Port Transport ID" so it is both the name and isid 
>  for
>  iscsi.
 >>>
 >>> It looks like we are supposed to go by what the initiator specifies in
 >>> the TPID field, so it can be either the Transport ID or Initiator Port
 >>> Transport ID.
>>> >>
>>> >> Hello Mike,
>>> >>
>>> >> Since the ISID is iSCSI-specific I think that all code that knows about 
>>> >> the
>>> >> ISID and its encoding should be in the iSCSI target driver instead of the
>>> >> target core. Do you think an approach similar to that of the SCST 
>>> >> function
>>> >> iscsi_get_initiator_port_transport_id() can be implemented in LIO? The 
>>> >> caller
>> > 
>> > Yeah, I can make that work.
> Hey Bart and Christoph,
> 
> Bart, I noticed we basically had what you are requesting but Christoph
> had moved the id code from the fabric drivers to lio core in this commit:
> 
> commit 2650d71e244fb3637b5f58a0080682a8bf9c7091
> Author: Christoph Hellwig 
> Date: Fri May 1 17:47:58 2015 +0200
> 
> target: move transport ID handling to the core
> 
> 
> So what do you guys want to do here? Revert Christoph's patch and go
> back to that style or have lio core do the transport ID processing?

Oh yeah for the latter, we can make it so at least target_core_pr.c does
not have to do any isid conversions. We could just rename sess_bin_isid
to sess_str_isid, store the isid as a string, and then never convert it
between the binary and string format. All the isid tests in
target_core_pr.c would be strcmps.

Re: [PATCH 02/15] target: fix isid copying and comparision

2018-07-20 Thread Mike Christie

On 07/19/2018 11:13 AM, Mike Christie wrote:
> On 07/19/2018 10:15 AM, Bart Van Assche wrote:
>> On Wed, 2018-07-18 at 20:02 -0500, Mike Christie wrote:
>>> On 07/18/2018 07:03 PM, Mike Christie wrote:
 On 07/18/2018 05:09 PM, Bart Van Assche wrote:
> [ ... ]
> is that these involve a transport ID and that that transport ID can be up 
> to 228
> bytes long for iSCSI.

 I am talking about the Initiator Session ID above. That along with the
 iscsi name make up the Initiator Port Transport ID. In spc4r37 checkout
 table 508 or in SAM 5r21 checkout table A.4.

 So in the SCSI specs as part of the Initiator Port Transport ID we have
 this from that SAM table:

 The Initiator Session Identifier (ISID) portion of the string is a UTF-8
 encoded hexadecimal representation of a six byte binary value.

 ---

 In the PR parts of SPC it sometimes mentions only "Transport ID" but
 then clarifies the initiator port so I am assuming in those cases it
 means "Initiator Port Transport ID" so it is both the name and isid for
 iscsi.
>>>
>>> It looks like we are supposed to go by what the initiator specifies in
>>> the TPID field, so it can be either the Transport ID or Initiator Port
>>> Transport ID.
>>
>> Hello Mike,
>>
>> Since the ISID is iSCSI-specific I think that all code that knows about the
>> ISID and its encoding should be in the iSCSI target driver instead of the
>> target core. Do you think an approach similar to that of the SCST function
>> iscsi_get_initiator_port_transport_id() can be implemented in LIO? The caller
> 
> Yeah, I can make that work.

Hey Bart and Christoph,

Bart, I noticed we basically had what you are requesting but Christoph
had moved the id code from the fabric drivers to lio core in this commit:

commit 2650d71e244fb3637b5f58a0080682a8bf9c7091
Author: Christoph Hellwig 
Date: Fri May 1 17:47:58 2015 +0200

target: move transport ID handling to the core


So what do you guys want to do here? Revert Christoph's patch and go
back to that style or have lio core do the transport ID processing?


> 
>> of that function is in source file scst/src/scst_targ.c:
>>
>>  res = sess->tgt->tgtt->get_initiator_port_transport_id(
>>  sess->tgt, sess, &sess->transport_id);
>>
>> Thanks,
>>
>> Bart.--
>> To unsubscribe from this list: send the line "unsubscribe target-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe target-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

we provide editing

2018-07-20 Thread Scott


Hi,

We provide image editing services like - photo cut out; photo clipping
path; photo masking; photo
shadow creation; photo color correction; photo retouching; beauty model
retouching on skin, face,
body; glamour retouching; products retouching and other image editing.

We are also offering to deliver testing for you, so that you get to know
our quality first hand.

If you want to explore further, please reply back.

Thanks and Regards,
Scott

[PATCH] scsi_debug: add cmd abort option to every_nth

2018-07-20 Thread Douglas Gilbert

This patch is motivated by a response in the thread:
  Re: [PATCH 0/5]stop normal completion path entering a timeout req
by Jianchao Wang . It generalizes the error injection of
blk_abort_request() to use scsi_debug's "every_nth" mechanism.
Ref with original patch to scsi_debug:
  https://lore.kernel.org/lkml/a68ad043-26a1-d3d8-2009-504ba4230...@oracle.com/

Signed-off-by: Douglas Gilbert 
---
This patch may not be correct, or sufficient (nothing else in SCSI
subsystem calls blk_abort_request() ) and it freezes my laptop with
lk 2.18.0-rc1 kernel based on MKP's repository. However that crash
may be the point of the original patch.

Needs positive feedback from knowledgeable parties.

 drivers/scsi/scsi_debug.c | 30 +++---
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 4dda192b8fba..9eda71138744 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -164,29 +164,29 @@ static const char *sdebug_version_date = "20180128";
 #define SDEBUG_OPT_RESET_NOISE 0x2000
 #define SDEBUG_OPT_NO_CDB_NOISE0x4000
 #define SDEBUG_OPT_HOST_BUSY   0x8000
+#define SDEBUG_OPT_CMD_ABORT   0x1
 #define SDEBUG_OPT_ALL_NOISE (SDEBUG_OPT_NOISE | SDEBUG_OPT_Q_NOISE | \
  SDEBUG_OPT_RESET_NOISE)
 #define SDEBUG_OPT_ALL_INJECTING (SDEBUG_OPT_RECOVERED_ERR | \
  SDEBUG_OPT_TRANSPORT_ERR | \
  SDEBUG_OPT_DIF_ERR | SDEBUG_OPT_DIX_ERR | \
  SDEBUG_OPT_SHORT_TRANSFER | \
- SDEBUG_OPT_HOST_BUSY)
+ SDEBUG_OPT_HOST_BUSY | \
+ SDEBUG_OPT_CMD_ABORT)
 /* When "every_nth" > 0 then modulo "every_nth" commands:
  *   - a missing response is simulated if SDEBUG_OPT_TIMEOUT is set
  *   - a RECOVERED_ERROR is simulated on successful read and write
  * commands if SDEBUG_OPT_RECOVERED_ERR is set.
  *   - a TRANSPORT_ERROR is simulated on successful read and write
  * commands if SDEBUG_OPT_TRANSPORT_ERR is set.
+ *   - similarly for DIF_ERR, DIX_ERR, SHORT_TRANSFER, HOST_BUSY and
+ * CMD_ABORT
  *
- * When "every_nth" < 0 then after "- every_nth" commands:
- *   - a missing response is simulated if SDEBUG_OPT_TIMEOUT is set
- *   - a RECOVERED_ERROR is simulated on successful read and write
- * commands if SDEBUG_OPT_RECOVERED_ERR is set.
- *   - a TRANSPORT_ERROR is simulated on successful read and write
- * commands if _DEBUG_OPT_TRANSPORT_ERR is set.
- * This will continue on every subsequent command until some other action
- * occurs (e.g. the user * writing a new value (other than -1 or 1) to
- * every_nth via sysfs).
+ * When "every_nth" < 0 then after "- every_nth" commands the selected
+ * error will be injected. The error will be injected on every subsequent
+ * command until some other action occurs; for example, the user writing
+ * a new value (other than -1 or 1) to every_nth:
+ *  echo 0 > /sys/bus/pseudo/drivers/scsi_debug/every_nth
  */
 
 /* As indicated in SAM-5 and SPC-4 Unit Attentions (UAs) are returned in
@@ -296,6 +296,7 @@ struct sdebug_queued_cmd {
unsigned int inj_dix:1;
unsigned int inj_short:1;
unsigned int inj_host_busy:1;
+   unsigned int inj_cmd_abort:1;
 };
 
 struct sdebug_queue {
@@ -4312,7 +4313,8 @@ static void setup_inject(struct sdebug_queue *sqp,
if (sdebug_every_nth > 0)
sqcp->inj_recovered = sqcp->inj_transport
= sqcp->inj_dif
-   = sqcp->inj_dix = sqcp->inj_short = 0;
+   = sqcp->inj_dix = sqcp->inj_short
+   = sqcp->inj_host_busy = sqcp->inj_cmd_abort = 0;
return;
}
sqcp->inj_recovered = !!(SDEBUG_OPT_RECOVERED_ERR & sdebug_opts);
@@ -4321,6 +4323,7 @@ static void setup_inject(struct sdebug_queue *sqp,
sqcp->inj_dix = !!(SDEBUG_OPT_DIX_ERR & sdebug_opts);
sqcp->inj_short = !!(SDEBUG_OPT_SHORT_TRANSFER & sdebug_opts);
sqcp->inj_host_busy = !!(SDEBUG_OPT_HOST_BUSY & sdebug_opts);
+   sqcp->inj_cmd_abort = !!(SDEBUG_OPT_CMD_ABORT & sdebug_opts);
 }
 
 /* Complete the processing of the thread that queued a SCSI command to this
@@ -4459,6 +4462,11 @@ static int schedule_resp(struct scsi_cmnd *cmnd, struct 
sdebug_dev_info *devip,
sd_dp->issuing_cpu = raw_smp_processor_id();
sd_dp->defer_t = SDEB_DEFER_WQ;
schedule_work(&sd_dp->ew.work);
+   if (unlikely(sqcp->inj_cmd_abort)) {
+   blk_abort_request(cmnd->request);
+   sdev_printk(KERN_INFO, sdp, "abort request tag %d\n",
+   cmnd->request->tag);
+   }
}
if (unl

we provide editing

2018-07-20 Thread Scott


Hi,

We provide image editing services like - photo cut out; photo clipping
path; photo masking; photo
shadow creation; photo color correction; photo retouching; beauty model
retouching on skin, face,
body; glamour retouching; products retouching and other image editing.

We are also offering to deliver testing for you, so that you get to know
our quality first hand.

If you want to explore further, please reply back.

Thanks and Regards,
Scott

Re: [PATCH] tcmu: fix crash for dereferencing the released udev->mb_addr memory

2018-07-20 Thread Mike Christie

On 07/19/2018 07:34 PM, Xiubo Li wrote:
> On 2018/7/19 23:49, Mike Christie wrote:
>> On 07/19/2018 09:30 AM, xiu...@redhat.com wrote:
>>> From: Xiubo Li 
>>>
>>> The logs are:
>>>
>>> BUG: unable to handle kernel NULL pointer dereference at 0040
>>> IP: [] tcmu_reset_ring_store+0x149/0x240 
>>> [target_core_user]
>>> PGD 8e254067 PUD e255067 PMD 0
>>> Oops: 0002 [#1] SMP
>>> [...]
>>> CPU: 0 PID: 36077 Comm: tcmu-runner Kdump: loaded Not tainted 
>>> 3.10.0-924.el7.test.x86_64 #1
>>
>> It is not clear to me how you hit this. It's not a RHEL only bug is it?
>> Are you sure you are hitting it during device removal?
> Yes, not only the RHEL bug, for the RHEL we have sync the related code
> to the upstream. Before as I remembered I have hit one sudden crash by
> using the SCSI repo code when testing the tcmu-runner PR#402:
> tcmu-runner: add reset netlink support.
> 
> The SCSI repo is :
> git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git
> 
> Since the kdump service was not working for the upstream code in my
> setups that time, so could only get very simple logs. This time I was
> using Prasanna's script and RHEL code with the netlink and UIO crash
> fixing patches to reproduce this. The test script please see the attachment.
> 
>> If we are calling tcmu_reset_ring_store isn't there a refcount on the
>> item? So we cannot call tcmu_destroy_device -> tcmu_dev_kref_release
>> until after that has been released, so they would never run at the same
>> time. And, if userspace were to try to open/write to that reset file
>> after the rmdir has been done on the se_device then configfs detects
>> this and fails the open.
> With the tcmu-runner PR#402, when starting tcmu-runner the daemon will do :
> 1, block netlink
> 2, reset netlink events
> 3, block device --> reset ring --> open device --> unblock device

If the device was only created and not yet enabled you will hit the
crash here.

If in your test you kill runner while a enable is in process then this
would happen.

> 4, unblock netlink
> 
> 
> Currently there may has one bug in systemd service, because we hit one
> odd issue by using the test script:
> When executing 'systemctl restart tcmu-runner', this process seems
> waiting one signal from somewhere by stucking itself, if we do nothing
> It will wait for about 6 ~20 minutes(the time comes from our test
> result) to recovery from that. And we can execute 'systemctl restart
> gluster-blockd' in another terminal, which could stop the stuck state
> above immediately.
> 
> [root@localhost ~]# ps -aux | grep -e tcmu -e
> target 
> root  2021  0.0  0.0  0 0 ?S<   15:09   0:00
> [target_completi]   
> root  2386  0.6  0.5 226232 11096 ?Ds   15:11   0:00
> /usr/bin/python /usr/bin/targetctl clear
> root  2400  0.0  0.0 134816  1316 pts/0S+   15:11   0:00
> systemctl restart tcmu-runnerroot  2415 
> 0.0  0.0 112704   988 pts/1R+   15:11   0:00 grep --color=auto -e
> tcmu -e target
> [root@localhost ~]# cat
> /proc/2400/stack  
>
> 
> []
> poll_schedule_timeout+0x55/0xb0   
>   
> 
> []
> do_sys_poll+0x48d/0x590   
>   
> 
> []
> SyS_ppoll+0x1d3/0x1f0 
>   
> 
> []
> system_call_fastpath+0x1c/0x21
>   
> 
> 
> And with IOs going on, when killing the tcmu-runner process, the

Are you just doing a test to kill runner at various times or is your
restart of your service doing this while the targetcli clear operation
is running?

If you re killing it at various times, how do you know you are not
killing it during a enable operation and hitting this bug there?

If you are killing on purpose as part of your restart operation why are
you killing it when the targetcli clear operation needs the backends to
complete their IO as part of the tpg disablement process.


> 'targetctl clear' process will stuck in [1] first. Then we restart the
> tcmu-runner, which will reset ring with accessing the udev->mb_addr,
> currently we still could access the 
> /sys/kernel/config/target/core/user_XX/*. And while the ring reset is
> still going on, the stucked 'targetctl clear' process will be woken up
> and then tries to delete the device which will kfree(udev->mb_addr),
> here any accessing to  /sys/kernel/config/target/core/user_XX/* will stuck.

The rtslib clear operation deletes the target first and that disables
the tpg which waits for outstanding commands. It then finishes tearing
down the objects under the target, and then the backend devices are deleted.

If targetcli clear is unhung from the

Re: [PATCH 11/15] target: export initiator port values for all sessions

2018-07-20 Thread Christoph Hellwig

On Thu, Jul 19, 2018 at 04:30:36PM -0500, Mike Christie wrote:
> Just to clarify. We can create a dir from the kernel already. It is no
> problem. I am doing that in this patchset with configfs_register_group.
> 
> What Bart was requesting originally and what is missing is being able to
> add a symlink from the kernel.
> 
> I have not fully looked into it, but I think it would be something like
> taking part of configfs_symlink and making it so we can call it with
> config_items for the 2 items to be symlinked.

Yes, it shouldn't be too hard.

[PATCH] mpt3sas: correct reset of smid while clearing scsi tracker

2018-07-20 Thread Sreekanth Reddy

In mpt3sas_base_clear_st() function smid value is reseted in wrong line,
i.e. driver should reset smid value to zero after decrementing chain_offset
counter in chain_lookup table but in current code, driver is resetting smid
value before decrementing the chain_offset counter. which we are correcting
with this patch.

v1 changelog:
Added memory barriers before & after atomic_set() API.

Signed-off-by: Sreekanth Reddy 
---
 drivers/scsi/mpt3sas/mpt3sas_base.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c 
b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 902610d..94359d8 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -1702,6 +1702,8 @@ static int mpt3sas_remove_dead_ioc_func(void *arg)
return NULL;
 
chain_req = &ioc->chain_lookup[smid - 1].chains_per_smid[chain_offset];
+   /* Adding memory barrier before atomic operation. */
+   smp_mb__before_atomic();
atomic_inc(&ioc->chain_lookup[smid - 1].chain_offset);
return chain_req;
 }
@@ -3283,8 +3285,12 @@ void mpt3sas_base_clear_st(struct MPT3SAS_ADAPTER *ioc,
return;
st->cb_idx = 0xFF;
st->direct_io = 0;
-   st->smid = 0;
+   /* Adding memory barrier before atomic operation. */
+   smp_mb__before_atomic();
atomic_set(&ioc->chain_lookup[st->smid - 1].chain_offset, 0);
+   /* Adding memory barrier after atomic operation. */
+   smp_mb__after_atomic();
+   st->smid = 0;
 }
 
 /**
-- 
1.8.3.1

[PATCH v2] scsi_debug: add cmd abort option to every_nth

Re: [PATCH] tcmu: fix crash for dereferencing the released udev->mb_addr memory

Re: [PATCH] scsi_debug: add cmd abort option to every_nth

Re: [PATCH 02/15] target: fix isid copying and comparision

Re: [PATCH 02/15] target: fix isid copying and comparision

Re: [PATCH 02/15] target: fix isid copying and comparision

we provide editing

[PATCH] scsi_debug: add cmd abort option to every_nth

we provide editing

Re: [PATCH] tcmu: fix crash for dereferencing the released udev->mb_addr memory

Re: [PATCH 11/15] target: export initiator port values for all sessions

[PATCH] mpt3sas: correct reset of smid while clearing scsi tracker

12 matches

Site Navigation

Mail list logo

Footer information