[Question]many kernel error "neighbour: ndisc_cache: neighbor table overflow!"

2020-06-24 Thread Jack Wang
Hi Folks,

In one of our big cluster, due to capacity increase, more servers are
added to the cluster, and we saw from many pserver reporting error
message below:
 "neighbour: ndisc_cache: neighbor table overflow!"

We've tested increasing the gc_thresh values in sysctl.conf, after
reboot, the errors are gone

+# Threshold when garbage collector becomes more aggressive about
+# purging entries. Entries older than 5 seconds will be cleared
+# when over this number.  Default: 512
+net.ipv4.neigh.default.gc_thresh2 = 4096
+net.ipv6.neigh.default.gc_thresh2 = 4096
+
+# Maximum number of non-PERMANENT neighbor entries allowed.  Increase
+# this when using large numbers of interfaces and when communicating
+# with large numbers of directly-connected peers.  Default: 1024
+net.ipv4.neigh.default.gc_thresh3 = 8192
+net.ipv6.neigh.default.gc_thresh3 = 8192

But we still have many systems running in production, so my question
is: is it safe to apply the setting on the fly when servers are
running with busy traffic? or we have to apply the setting only
through sysctl during boot?

Most of our servers with default settings are running kernel 4.14.137~4.14.154

Thanks in advance!

Best regards!

Jack Wang


Re: [PATCH 4.19 00/57] 4.19.72-stable review

2019-09-10 Thread Jack Wang
Greg Kroah-Hartman  于2019年9月9日周一 下午12:19写道:
>
> This is the start of the stable review cycle for the 4.19.72 release.
> There are 57 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Tue 10 Sep 2019 12:09:36 PM UTC.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.72-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.19.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Merged, boot and tested on my testing machine, no regression found.

Thanks,
Jack Wang


Re: [PATCH AUTOSEL 4.4 04/14] perf header: Fix divide by zero error if f_header.attr_size==0

2019-08-19 Thread Jack Wang
Sasha Levin  于2019年8月6日周二 下午11:39写道:
>
> From: Vince Weaver 
>
> [ Upstream commit 7622236ceb167aa3857395f9bdaf871442aa467e ]
>
> So I have been having lots of trouble with hand-crafted perf.data files
> causing segfaults and the like, so I have started fuzzing the perf tool.
>
> First issue found:
>
> If f_header.attr_size is 0 in the perf.data file, then perf will crash
> with a divide-by-zero error.
>
> Committer note:
>
> Added a pr_err() to tell the user why the command failed.
>
> Signed-off-by: Vince Weaver 
> Cc: Alexander Shishkin 
> Cc: Jiri Olsa 
> Cc: Namhyung Kim 
> Cc: Peter Zijlstra 
> Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1907231100440.14532@macbook-air
> Signed-off-by: Arnaldo Carvalho de Melo 
> Signed-off-by: Sasha Levin 
> ---
>  tools/perf/util/header.c | 7 +++
>  1 file changed, 7 insertions(+)
>
Hi all,

This on cause build failure when I rebased to 4.14.140-rc1 in stable-rc tree.

util/header.c: In function 'perf_session__read_header':
util/header.c:2907:10: error: 'data' undeclared (first use in this
function); did you mean 'dots'?
  data->file.path);
  Should be fixed by:
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -2904,7 +2904,7 @@ int perf_session__read_header(struct
perf_session *session)
if (f_header.attr_size == 0) {
pr_err("ERROR: The %s file's attr size field is 0
which is unexpected.\n"
   "Was the 'perf record' command properly terminated?\n",
-  data->file.path);
+  file->path);
return -EINVAL;

Regards,
Jack Wang

> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index 304f5d7101436..0102dd46fb6da 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -2591,6 +2591,13 @@ int perf_session__read_header(struct perf_session 
> *session)
>file->path);
> }
>
> +   if (f_header.attr_size == 0) {
> +   pr_err("ERROR: The %s file's attr size field is 0 which is 
> unexpected.\n"
> +  "Was the 'perf record' command properly terminated?\n",
> +  data->file.path);
> +   return -EINVAL;
> +   }
> +
> nr_attrs = f_header.attrs.size / f_header.attr_size;
> lseek(fd, f_header.attrs.offset, SEEK_SET);
>
> --
> 2.20.1
>


Re: [PATCH 4.14 00/53] 4.14.137-stable review

2019-08-06 Thread Jack Wang
Greg Kroah-Hartman  于2019年8月5日周一 下午3:14写道:
>
> This is the start of the stable review cycle for the 4.14.137 release.
> There are 53 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed 07 Aug 2019 12:47:58 PM UTC.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.137-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.14.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Merge, and regression tested on my test machines,  all looks good!

Thanks,
Jack Wang


Re: [PATCH stable-4.19 1/2] KVM: nVMX: do not use dangling shadow VMCS after guest reset

2019-07-29 Thread Jack Wang
Paolo Bonzini  于2019年7月29日周一 上午11:10写道:
>
> On 29/07/19 10:58, Jack Wang wrote:
> > Vitaly Kuznetsov  于2019年7月25日周四 下午3:29写道:
> >>
> >> From: Paolo Bonzini 
> >>
> >> [ Upstream commit 88dddc11a8d6b09201b4db9d255b3394d9bc9e57 ]
> >>
> >> If a KVM guest is reset while running a nested guest, free_nested will
> >> disable the shadow VMCS execution control in the vmcs01.  However,
> >> on the next KVM_RUN vmx_vcpu_run would nevertheless try to sync
> >> the VMCS12 to the shadow VMCS which has since been freed.
> >>
> >> This causes a vmptrld of a NULL pointer on my machime, but Jan reports
> >> the host to hang altogether.  Let's see how much this trivial patch fixes.
> >>
> >> Reported-by: Jan Kiszka 
> >> Cc: Liran Alon 
> >> Cc: sta...@vger.kernel.org
> >> Signed-off-by: Paolo Bonzini 
> >
> > Hi all,
> >
> > Do we need to backport the fix also to stable 4.14?  It applies
> > cleanly and compiles fine.
>
> The reproducer required newer kernels that support KVM_GET_NESTED_STATE
> and KVM_SET_NESTED_STATE, so it would be hard to test it.  However, the
> patch itself should be safe.
>
> Paolo

Thanks Paolo for confirmation. I'm asking because we had one incident
in our production with 4.14.129 kernel,
System is Skylake Gold cpu, first kvm errors, host hung afterwards

kernel: [1186161.091160] kvm: vmptrld   (null)/6bfc failed
kernel: [1186161.091537] kvm: vmclear fail:   (null)/6bfc
kernel: [1186186.490300] watchdog: BUG: soft lockup - CPU#54 stuck for
23s! [qemu:16639]

Hi Sasha, hi Greg,

Would be great if you can pick this patch also to 4.14 kernel.

Best regards,
Jack Wang


Re: [PATCH stable-4.19 1/2] KVM: nVMX: do not use dangling shadow VMCS after guest reset

2019-07-29 Thread Jack Wang
Vitaly Kuznetsov  于2019年7月25日周四 下午3:29写道:
>
> From: Paolo Bonzini 
>
> [ Upstream commit 88dddc11a8d6b09201b4db9d255b3394d9bc9e57 ]
>
> If a KVM guest is reset while running a nested guest, free_nested will
> disable the shadow VMCS execution control in the vmcs01.  However,
> on the next KVM_RUN vmx_vcpu_run would nevertheless try to sync
> the VMCS12 to the shadow VMCS which has since been freed.
>
> This causes a vmptrld of a NULL pointer on my machime, but Jan reports
> the host to hang altogether.  Let's see how much this trivial patch fixes.
>
> Reported-by: Jan Kiszka 
> Cc: Liran Alon 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Paolo Bonzini 

Hi all,

Do we need to backport the fix also to stable 4.14?  It applies
cleanly and compiles fine.

Regards,
Jack Wang


Re: [PATCH 4.14 00/56] 4.14.133-stable review

2019-07-09 Thread Jack Wang
>
> This is the start of the stable review cycle for the 4.14.133 release.
> There are 56 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed 10 Jul 2019 03:03:52 PM UTC.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.133-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.14.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Merged and tested with my x86_64 systems, no regression found.

Regards,
Jack Wang @ 1 & 1 IONOS Cloud GmbH


Re: BUG: aio/direct-io data corruption in 4.7

2018-11-09 Thread Jack Wang
Gregory Shapiro  于2018年11月6日周二 下午12:31写道:
>
> Hi Jack,
> I tested it in 4.9.102 and I checked the latest code from elixir
> (versions 4.19 and 4.20) and the error in code is still present there.
> More on the scenario and the bug:
> I experienced data corruption in my application (nvme based storage).
> The issue was caused because of faulty hardware, but the real problem
> is I got a correct number of bytes in io_getevents thus couldn't
> recognize it correctly the error.
> Looking at the /var/log/messages and  I saw the following errors in
> time of coruption:
Thanks for the info, Gregory.

I noticed guys from Amazon pushing the fix to upstream:
https://lore.kernel.org/patchwork/patch/1008443/
I hope it will be in upstream soon.


Regards,
Jack Wang


Re: BUG: aio/direct-io data corruption in 4.7

2018-11-09 Thread Jack Wang
Gregory Shapiro  于2018年11月6日周二 下午12:31写道:
>
> Hi Jack,
> I tested it in 4.9.102 and I checked the latest code from elixir
> (versions 4.19 and 4.20) and the error in code is still present there.
> More on the scenario and the bug:
> I experienced data corruption in my application (nvme based storage).
> The issue was caused because of faulty hardware, but the real problem
> is I got a correct number of bytes in io_getevents thus couldn't
> recognize it correctly the error.
> Looking at the /var/log/messages and  I saw the following errors in
> time of coruption:
Thanks for the info, Gregory.

I noticed guys from Amazon pushing the fix to upstream:
https://lore.kernel.org/patchwork/patch/1008443/
I hope it will be in upstream soon.


Regards,
Jack Wang


Re: [PATCH AUTOSEL 4.19 118/146] MD: fix invalid stored role for a disk

2018-11-05 Thread Jack Wang
Sasha Levin  于2018年11月1日周四 上午12:45写道:
>
> From: Shaohua Li 
>
> [ Upstream commit d595567dc4f0c1d90685ec1e2e296e2cad2643ac ]
>
> If we change the number of array's device after device is removed from array,
> then add the device back to array, we can see that device is added as active
> role instead of spare which we expected.
>
> Please see the below link for details:
> https://marc.info/?l=linux-raid=153736982015076=2
>
> This is caused by that we prefer to use device's previous role which is
> recorded by saved_raid_disk, but we should respect the new number of
> conf->raid_disks since it could be changed after device is removed.
>
> Reported-by: Gioh Kim 
> Tested-by: Gioh Kim 
> Acked-by: Guoqing Jiang 
> 9e753ba9b9b405e3902d9f08aec5f2ea58a0c317
> Signed-off-by: Shaohua Li 
> Signed-off-by: Sasha Levin 
Hi Sasha,

This patch breaks linear hotadd please also include  commit
9e753ba9b9b405e3902d9f08aec5f2ea58a0c317
MD: fix invalid stored role for a disk - try2

Regards,
Jack Wang

> ---
>  drivers/md/md.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 8668793262d0..85459c17cc60 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -1776,6 +1776,10 @@ static int super_1_validate(struct mddev *mddev, 
> struct md_rdev *rdev)
> } else
> set_bit(In_sync, >flags);
> rdev->raid_disk = role;
> +   if (role >= mddev->raid_disks) {
> +   rdev->saved_raid_disk = -1;
> +   rdev->raid_disk = -1;
> +   }
> break;
> }
> if (sb->devflags & WriteMostly1)
> --
> 2.17.1
>


Re: [PATCH AUTOSEL 4.19 118/146] MD: fix invalid stored role for a disk

2018-11-05 Thread Jack Wang
Sasha Levin  于2018年11月1日周四 上午12:45写道:
>
> From: Shaohua Li 
>
> [ Upstream commit d595567dc4f0c1d90685ec1e2e296e2cad2643ac ]
>
> If we change the number of array's device after device is removed from array,
> then add the device back to array, we can see that device is added as active
> role instead of spare which we expected.
>
> Please see the below link for details:
> https://marc.info/?l=linux-raid=153736982015076=2
>
> This is caused by that we prefer to use device's previous role which is
> recorded by saved_raid_disk, but we should respect the new number of
> conf->raid_disks since it could be changed after device is removed.
>
> Reported-by: Gioh Kim 
> Tested-by: Gioh Kim 
> Acked-by: Guoqing Jiang 
> 9e753ba9b9b405e3902d9f08aec5f2ea58a0c317
> Signed-off-by: Shaohua Li 
> Signed-off-by: Sasha Levin 
Hi Sasha,

This patch breaks linear hotadd please also include  commit
9e753ba9b9b405e3902d9f08aec5f2ea58a0c317
MD: fix invalid stored role for a disk - try2

Regards,
Jack Wang

> ---
>  drivers/md/md.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 8668793262d0..85459c17cc60 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -1776,6 +1776,10 @@ static int super_1_validate(struct mddev *mddev, 
> struct md_rdev *rdev)
> } else
> set_bit(In_sync, >flags);
> rdev->raid_disk = role;
> +   if (role >= mddev->raid_disks) {
> +   rdev->saved_raid_disk = -1;
> +   rdev->raid_disk = -1;
> +   }
> break;
> }
> if (sb->devflags & WriteMostly1)
> --
> 2.17.1
>


Re: BUG: aio/direct-io data corruption in 4.7

2018-11-05 Thread Jack Wang
Gregory Shapiro  于2018年11月5日周一 下午4:19写道:
>
> Hello, my name is Gregory Shapiro and I am a newbie on this list.
> I recently encountered data corruption as I got a kernel to
> acknowledge write ("io_getevents" system call with a correct number of
> bytes) but undergoing write to disk failed.
> After investigating the problem I found it is identical to issue found
> in direct-io.c mentioned the bellow thread.
> https://lore.kernel.org/lkml/20160921141539.ga17...@infradead.org/
> Is there a reason proposed patch didn't apply to the kernel?
> When can I expect it to be applied?
> Thanks,
>  Gregory

Hi Gregory,

Thanks for your info.
Have you tried with latest kernel other than 4.7, is the problem still there?

Could you share your test case?

Regards,
Jack Wang


Re: BUG: aio/direct-io data corruption in 4.7

2018-11-05 Thread Jack Wang
Gregory Shapiro  于2018年11月5日周一 下午4:19写道:
>
> Hello, my name is Gregory Shapiro and I am a newbie on this list.
> I recently encountered data corruption as I got a kernel to
> acknowledge write ("io_getevents" system call with a correct number of
> bytes) but undergoing write to disk failed.
> After investigating the problem I found it is identical to issue found
> in direct-io.c mentioned the bellow thread.
> https://lore.kernel.org/lkml/20160921141539.ga17...@infradead.org/
> Is there a reason proposed patch didn't apply to the kernel?
> When can I expect it to be applied?
> Thanks,
>  Gregory

Hi Gregory,

Thanks for your info.
Have you tried with latest kernel other than 4.7, is the problem still there?

Could you share your test case?

Regards,
Jack Wang


[PATCH] md: fix memleak for mempool

2018-10-19 Thread Jack Wang
From: Jack Wang 

I noticed kmemleak report memory leak when run create/stop
md in a loop, backtrace:
[<1ca975e7>] mempool_create_node+0x86/0xd0
[<95576bcd>] md_run+0x1057/0x1410 [md_mod]
[<7b45c5fc>] do_md_run+0x15/0x130 [md_mod]
[<1ede9ec0>] md_ioctl+0x1f49/0x25d0 [md_mod]
[<4142cacf>] blkdev_ioctl+0x680/0xd00

The root cause is we alloc mddev->flush_pool and
mddev->flush_bio_pool in md_run, but from do_md_stop
will not call into md_stop but __md_stop, move the
mempool_destroy to __md_stop fixes the problem for me.

The bug was introduced in 5a409b4f56d5, the fixes should go to
4.18+

Cc: Xiao Ni 
Fixes: 5a409b4f56d5 ("MD: fix lock contention for flush bios")
Signed-off-by: Jack Wang 
---
 drivers/md/md.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 4c0f3e0331d5..feb6145097da 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5904,14 +5904,6 @@ static void __md_stop(struct mddev *mddev)
mddev->to_remove = _redundancy_group;
module_put(pers->owner);
clear_bit(MD_RECOVERY_FROZEN, >recovery);
-}
-
-void md_stop(struct mddev *mddev)
-{
-   /* stop the array and free an attached data structures.
-* This is called from dm-raid
-*/
-   __md_stop(mddev);
if (mddev->flush_bio_pool) {
mempool_destroy(mddev->flush_bio_pool);
mddev->flush_bio_pool = NULL;
@@ -5920,6 +5912,14 @@ void md_stop(struct mddev *mddev)
mempool_destroy(mddev->flush_pool);
mddev->flush_pool = NULL;
}
+}
+
+void md_stop(struct mddev *mddev)
+{
+   /* stop the array and free an attached data structures.
+* This is called from dm-raid
+*/
+   __md_stop(mddev);
bioset_exit(>bio_set);
bioset_exit(>sync_set);
 }
-- 
2.7.4



[PATCH] md: fix memleak for mempool

2018-10-19 Thread Jack Wang
From: Jack Wang 

I noticed kmemleak report memory leak when run create/stop
md in a loop, backtrace:
[<1ca975e7>] mempool_create_node+0x86/0xd0
[<95576bcd>] md_run+0x1057/0x1410 [md_mod]
[<7b45c5fc>] do_md_run+0x15/0x130 [md_mod]
[<1ede9ec0>] md_ioctl+0x1f49/0x25d0 [md_mod]
[<4142cacf>] blkdev_ioctl+0x680/0xd00

The root cause is we alloc mddev->flush_pool and
mddev->flush_bio_pool in md_run, but from do_md_stop
will not call into md_stop but __md_stop, move the
mempool_destroy to __md_stop fixes the problem for me.

The bug was introduced in 5a409b4f56d5, the fixes should go to
4.18+

Cc: Xiao Ni 
Fixes: 5a409b4f56d5 ("MD: fix lock contention for flush bios")
Signed-off-by: Jack Wang 
---
 drivers/md/md.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 4c0f3e0331d5..feb6145097da 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5904,14 +5904,6 @@ static void __md_stop(struct mddev *mddev)
mddev->to_remove = _redundancy_group;
module_put(pers->owner);
clear_bit(MD_RECOVERY_FROZEN, >recovery);
-}
-
-void md_stop(struct mddev *mddev)
-{
-   /* stop the array and free an attached data structures.
-* This is called from dm-raid
-*/
-   __md_stop(mddev);
if (mddev->flush_bio_pool) {
mempool_destroy(mddev->flush_bio_pool);
mddev->flush_bio_pool = NULL;
@@ -5920,6 +5912,14 @@ void md_stop(struct mddev *mddev)
mempool_destroy(mddev->flush_pool);
mddev->flush_pool = NULL;
}
+}
+
+void md_stop(struct mddev *mddev)
+{
+   /* stop the array and free an attached data structures.
+* This is called from dm-raid
+*/
+   __md_stop(mddev);
bioset_exit(>bio_set);
bioset_exit(>sync_set);
 }
-- 
2.7.4



[PATCH] lib: memcmp optimization

2018-10-09 Thread Jack Wang
From: Florian-Ewald Mueller 

During testing, I have configured 128 md/raid1's and, while under
heavy IO, I started a check on each of them
(echo check > /sys/block/mdx/md/sync_action).

The CPU utilization went through the ceiling and when looking for
the cause (with 'perf top'). I've discovered that ~50% of the time
was spend in memcmp() called from process_checks().

With this patch applied, it drops to 4% - 10%.

Signed-off-by: Florian-Ewald Mueller 
[jwang: reformat the commit message]
Signed-off-by: Jack Wang 
---
 lib/string.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/lib/string.c b/lib/string.c
index 2c0900a5d51a..932ef9af2baa 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -852,7 +852,7 @@ EXPORT_SYMBOL(memmove);
  * @count: The size of the area.
  */
 #undef memcmp
-__visible int memcmp(const void *cs, const void *ct, size_t count)
+static inline int __memcmp(const void *cs, const void *ct, size_t count)
 {
const unsigned char *su1, *su2;
int res = 0;
@@ -862,6 +862,20 @@ __visible int memcmp(const void *cs, const void *ct, 
size_t count)
break;
return res;
 }
+__visible int memcmp(const void *cs, const void *ct, size_t count)
+{
+   const uint64_t *l1p = cs;
+   const uint64_t *l2p = ct;
+
+   while (count >= sizeof(*l1p)) {
+   if (*l1p != *l2p)
+   return __memcmp(l1p, l2p, sizeof(*l1p));
+   count -= sizeof(*l1p);
+   ++l1p;
+   ++l2p;
+   }
+   return __memcmp(l1p, l2p, count);
+}
 EXPORT_SYMBOL(memcmp);
 #endif
 
-- 
2.7.4



[PATCH] lib: memcmp optimization

2018-10-09 Thread Jack Wang
From: Florian-Ewald Mueller 

During testing, I have configured 128 md/raid1's and, while under
heavy IO, I started a check on each of them
(echo check > /sys/block/mdx/md/sync_action).

The CPU utilization went through the ceiling and when looking for
the cause (with 'perf top'). I've discovered that ~50% of the time
was spend in memcmp() called from process_checks().

With this patch applied, it drops to 4% - 10%.

Signed-off-by: Florian-Ewald Mueller 
[jwang: reformat the commit message]
Signed-off-by: Jack Wang 
---
 lib/string.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/lib/string.c b/lib/string.c
index 2c0900a5d51a..932ef9af2baa 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -852,7 +852,7 @@ EXPORT_SYMBOL(memmove);
  * @count: The size of the area.
  */
 #undef memcmp
-__visible int memcmp(const void *cs, const void *ct, size_t count)
+static inline int __memcmp(const void *cs, const void *ct, size_t count)
 {
const unsigned char *su1, *su2;
int res = 0;
@@ -862,6 +862,20 @@ __visible int memcmp(const void *cs, const void *ct, 
size_t count)
break;
return res;
 }
+__visible int memcmp(const void *cs, const void *ct, size_t count)
+{
+   const uint64_t *l1p = cs;
+   const uint64_t *l2p = ct;
+
+   while (count >= sizeof(*l1p)) {
+   if (*l1p != *l2p)
+   return __memcmp(l1p, l2p, sizeof(*l1p));
+   count -= sizeof(*l1p);
+   ++l1p;
+   ++l2p;
+   }
+   return __memcmp(l1p, l2p, count);
+}
 EXPORT_SYMBOL(memcmp);
 #endif
 
-- 
2.7.4



[PATCH] md/bitmap: use mddev_suspend/resume instead of ->quiesce()

2018-10-08 Thread Jack Wang
From: Jack Wang 

After 9e1cc0a54556 ("md: use mddev_suspend/resume instead of ->quiesce()")
We still have similar left in bitmap functions.

Replace quiesce() with mddev_suspend/resume.

Also move md_bitmap_create out of mddev_suspend. and move mddev_resume
after md_bitmap_destroy. as we did in set_bitmap_file.

Signed-off-by: Jack Wang 
Reviewed-by: Gioh Kim 

---
v3->v2: Drop the change in md_bitmap_resize, as Shaohua noticed
mddev_suspend/resume is supposed to be called with reconfig_mutex hold, it's not
the case here.
v2->v1: add reviewed-by.
---
 drivers/md/md-bitmap.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 2fc8c113977f..1cd4f991792c 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -2288,9 +2288,9 @@ location_store(struct mddev *mddev, const char *buf, 
size_t len)
goto out;
}
if (mddev->pers) {
-   mddev->pers->quiesce(mddev, 1);
+   mddev_suspend(mddev);
md_bitmap_destroy(mddev);
-   mddev->pers->quiesce(mddev, 0);
+   mddev_resume(mddev);
}
mddev->bitmap_info.offset = 0;
if (mddev->bitmap_info.file) {
@@ -2327,8 +2327,8 @@ location_store(struct mddev *mddev, const char *buf, 
size_t len)
mddev->bitmap_info.offset = offset;
if (mddev->pers) {
struct bitmap *bitmap;
-   mddev->pers->quiesce(mddev, 1);
bitmap = md_bitmap_create(mddev, -1);
+   mddev_suspend(mddev);
if (IS_ERR(bitmap))
rv = PTR_ERR(bitmap);
else {
@@ -2337,11 +2337,12 @@ location_store(struct mddev *mddev, const char *buf, 
size_t len)
if (rv)
mddev->bitmap_info.offset = 0;
}
-   mddev->pers->quiesce(mddev, 0);
if (rv) {
md_bitmap_destroy(mddev);
+   mddev_resume(mddev);
goto out;
}
+   mddev_resume(mddev);
}
}
}
-- 
2.7.4



[PATCH] md/bitmap: use mddev_suspend/resume instead of ->quiesce()

2018-10-08 Thread Jack Wang
From: Jack Wang 

After 9e1cc0a54556 ("md: use mddev_suspend/resume instead of ->quiesce()")
We still have similar left in bitmap functions.

Replace quiesce() with mddev_suspend/resume.

Also move md_bitmap_create out of mddev_suspend. and move mddev_resume
after md_bitmap_destroy. as we did in set_bitmap_file.

Signed-off-by: Jack Wang 
Reviewed-by: Gioh Kim 

---
v3->v2: Drop the change in md_bitmap_resize, as Shaohua noticed
mddev_suspend/resume is supposed to be called with reconfig_mutex hold, it's not
the case here.
v2->v1: add reviewed-by.
---
 drivers/md/md-bitmap.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 2fc8c113977f..1cd4f991792c 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -2288,9 +2288,9 @@ location_store(struct mddev *mddev, const char *buf, 
size_t len)
goto out;
}
if (mddev->pers) {
-   mddev->pers->quiesce(mddev, 1);
+   mddev_suspend(mddev);
md_bitmap_destroy(mddev);
-   mddev->pers->quiesce(mddev, 0);
+   mddev_resume(mddev);
}
mddev->bitmap_info.offset = 0;
if (mddev->bitmap_info.file) {
@@ -2327,8 +2327,8 @@ location_store(struct mddev *mddev, const char *buf, 
size_t len)
mddev->bitmap_info.offset = offset;
if (mddev->pers) {
struct bitmap *bitmap;
-   mddev->pers->quiesce(mddev, 1);
bitmap = md_bitmap_create(mddev, -1);
+   mddev_suspend(mddev);
if (IS_ERR(bitmap))
rv = PTR_ERR(bitmap);
else {
@@ -2337,11 +2337,12 @@ location_store(struct mddev *mddev, const char *buf, 
size_t len)
if (rv)
mddev->bitmap_info.offset = 0;
}
-   mddev->pers->quiesce(mddev, 0);
if (rv) {
md_bitmap_destroy(mddev);
+   mddev_resume(mddev);
goto out;
}
+   mddev_resume(mddev);
}
}
}
-- 
2.7.4



[PATCH] md/bitmap: use mddev_suspend/resume instead of ->quiesce()

2018-09-27 Thread Jack Wang
From: Jack Wang 

After 9e1cc0a54556 ("md: use mddev_suspend/resume instead of ->quiesce()")
We still have similar left in bitmap functions.

Replace quiesce() with mddev_suspend/resume.

Also move md_bitmap_create out of mddev_suspend. and move mddev_resume
after md_bitmap_destroy. as we did in set_bitmap_file.

Signed-off-by: Jack Wang 
Reviewed-by: Gioh Kim 

---
v2->v1: add reviewed-by.
---
 drivers/md/md-bitmap.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 2fc8c113977f..c369f1753ea6 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -2123,7 +2123,7 @@ int md_bitmap_resize(struct bitmap *bitmap, sector_t 
blocks,
}
 
if (!init)
-   bitmap->mddev->pers->quiesce(bitmap->mddev, 1);
+   mddev_suspend(bitmap->mddev);
 
store.file = bitmap->storage.file;
bitmap->storage.file = NULL;
@@ -2239,7 +2239,7 @@ int md_bitmap_resize(struct bitmap *bitmap, sector_t 
blocks,
 
if (!init) {
md_bitmap_unplug(bitmap);
-   bitmap->mddev->pers->quiesce(bitmap->mddev, 0);
+   mddev_resume(bitmap->mddev);
}
ret = 0;
 err:
@@ -2288,9 +2288,9 @@ location_store(struct mddev *mddev, const char *buf, 
size_t len)
goto out;
}
if (mddev->pers) {
-   mddev->pers->quiesce(mddev, 1);
+   mddev_suspend(mddev);
md_bitmap_destroy(mddev);
-   mddev->pers->quiesce(mddev, 0);
+   mddev_resume(mddev);
}
mddev->bitmap_info.offset = 0;
if (mddev->bitmap_info.file) {
@@ -2327,8 +2327,8 @@ location_store(struct mddev *mddev, const char *buf, 
size_t len)
mddev->bitmap_info.offset = offset;
if (mddev->pers) {
struct bitmap *bitmap;
-   mddev->pers->quiesce(mddev, 1);
bitmap = md_bitmap_create(mddev, -1);
+   mddev_suspend(mddev);
if (IS_ERR(bitmap))
rv = PTR_ERR(bitmap);
else {
@@ -2337,11 +2337,12 @@ location_store(struct mddev *mddev, const char *buf, 
size_t len)
if (rv)
mddev->bitmap_info.offset = 0;
}
-   mddev->pers->quiesce(mddev, 0);
if (rv) {
md_bitmap_destroy(mddev);
+   mddev_resume(mddev);
goto out;
}
+   mddev_resume(mddev);
}
}
}
-- 
2.7.4



[PATCH] md/bitmap: use mddev_suspend/resume instead of ->quiesce()

2018-09-27 Thread Jack Wang
From: Jack Wang 

After 9e1cc0a54556 ("md: use mddev_suspend/resume instead of ->quiesce()")
We still have similar left in bitmap functions.

Replace quiesce() with mddev_suspend/resume.

Also move md_bitmap_create out of mddev_suspend. and move mddev_resume
after md_bitmap_destroy. as we did in set_bitmap_file.

Signed-off-by: Jack Wang 
Reviewed-by: Gioh Kim 

---
v2->v1: add reviewed-by.
---
 drivers/md/md-bitmap.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 2fc8c113977f..c369f1753ea6 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -2123,7 +2123,7 @@ int md_bitmap_resize(struct bitmap *bitmap, sector_t 
blocks,
}
 
if (!init)
-   bitmap->mddev->pers->quiesce(bitmap->mddev, 1);
+   mddev_suspend(bitmap->mddev);
 
store.file = bitmap->storage.file;
bitmap->storage.file = NULL;
@@ -2239,7 +2239,7 @@ int md_bitmap_resize(struct bitmap *bitmap, sector_t 
blocks,
 
if (!init) {
md_bitmap_unplug(bitmap);
-   bitmap->mddev->pers->quiesce(bitmap->mddev, 0);
+   mddev_resume(bitmap->mddev);
}
ret = 0;
 err:
@@ -2288,9 +2288,9 @@ location_store(struct mddev *mddev, const char *buf, 
size_t len)
goto out;
}
if (mddev->pers) {
-   mddev->pers->quiesce(mddev, 1);
+   mddev_suspend(mddev);
md_bitmap_destroy(mddev);
-   mddev->pers->quiesce(mddev, 0);
+   mddev_resume(mddev);
}
mddev->bitmap_info.offset = 0;
if (mddev->bitmap_info.file) {
@@ -2327,8 +2327,8 @@ location_store(struct mddev *mddev, const char *buf, 
size_t len)
mddev->bitmap_info.offset = offset;
if (mddev->pers) {
struct bitmap *bitmap;
-   mddev->pers->quiesce(mddev, 1);
bitmap = md_bitmap_create(mddev, -1);
+   mddev_suspend(mddev);
if (IS_ERR(bitmap))
rv = PTR_ERR(bitmap);
else {
@@ -2337,11 +2337,12 @@ location_store(struct mddev *mddev, const char *buf, 
size_t len)
if (rv)
mddev->bitmap_info.offset = 0;
}
-   mddev->pers->quiesce(mddev, 0);
if (rv) {
md_bitmap_destroy(mddev);
+   mddev_resume(mddev);
goto out;
}
+   mddev_resume(mddev);
}
}
}
-- 
2.7.4



[PATCH] md: bitmap: use mddev_suspend/resume instead of ->quiesce()

2018-09-13 Thread Jack Wang
From: Jack Wang 

After 9e1cc0a54556 ("md: use mddev_suspend/resume instead of ->quiesce()")
We still have similar left in bitmap functions.

Replace quiesce() with mddev_suspend/resume.

Also move md_bitmap_create out of mddev_suspend. and move mddev_resume
after md_bitmap_destroy. as we did in set_bitmap_file.

Signed-off-by: Jack Wang 
---
 drivers/md/md-bitmap.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 2fc8c113977f..c369f1753ea6 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -2123,7 +2123,7 @@ int md_bitmap_resize(struct bitmap *bitmap, sector_t 
blocks,
}
 
if (!init)
-   bitmap->mddev->pers->quiesce(bitmap->mddev, 1);
+   mddev_suspend(bitmap->mddev);
 
store.file = bitmap->storage.file;
bitmap->storage.file = NULL;
@@ -2239,7 +2239,7 @@ int md_bitmap_resize(struct bitmap *bitmap, sector_t 
blocks,
 
if (!init) {
md_bitmap_unplug(bitmap);
-   bitmap->mddev->pers->quiesce(bitmap->mddev, 0);
+   mddev_resume(bitmap->mddev);
}
ret = 0;
 err:
@@ -2288,9 +2288,9 @@ location_store(struct mddev *mddev, const char *buf, 
size_t len)
goto out;
}
if (mddev->pers) {
-   mddev->pers->quiesce(mddev, 1);
+   mddev_suspend(mddev);
md_bitmap_destroy(mddev);
-   mddev->pers->quiesce(mddev, 0);
+   mddev_resume(mddev);
}
mddev->bitmap_info.offset = 0;
if (mddev->bitmap_info.file) {
@@ -2327,8 +2327,8 @@ location_store(struct mddev *mddev, const char *buf, 
size_t len)
mddev->bitmap_info.offset = offset;
if (mddev->pers) {
struct bitmap *bitmap;
-   mddev->pers->quiesce(mddev, 1);
bitmap = md_bitmap_create(mddev, -1);
+   mddev_suspend(mddev);
if (IS_ERR(bitmap))
rv = PTR_ERR(bitmap);
else {
@@ -2337,11 +2337,12 @@ location_store(struct mddev *mddev, const char *buf, 
size_t len)
if (rv)
mddev->bitmap_info.offset = 0;
}
-   mddev->pers->quiesce(mddev, 0);
if (rv) {
md_bitmap_destroy(mddev);
+   mddev_resume(mddev);
goto out;
}
+   mddev_resume(mddev);
}
}
}
-- 
2.7.4



[PATCH] md: bitmap: use mddev_suspend/resume instead of ->quiesce()

2018-09-13 Thread Jack Wang
From: Jack Wang 

After 9e1cc0a54556 ("md: use mddev_suspend/resume instead of ->quiesce()")
We still have similar left in bitmap functions.

Replace quiesce() with mddev_suspend/resume.

Also move md_bitmap_create out of mddev_suspend. and move mddev_resume
after md_bitmap_destroy. as we did in set_bitmap_file.

Signed-off-by: Jack Wang 
---
 drivers/md/md-bitmap.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 2fc8c113977f..c369f1753ea6 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -2123,7 +2123,7 @@ int md_bitmap_resize(struct bitmap *bitmap, sector_t 
blocks,
}
 
if (!init)
-   bitmap->mddev->pers->quiesce(bitmap->mddev, 1);
+   mddev_suspend(bitmap->mddev);
 
store.file = bitmap->storage.file;
bitmap->storage.file = NULL;
@@ -2239,7 +2239,7 @@ int md_bitmap_resize(struct bitmap *bitmap, sector_t 
blocks,
 
if (!init) {
md_bitmap_unplug(bitmap);
-   bitmap->mddev->pers->quiesce(bitmap->mddev, 0);
+   mddev_resume(bitmap->mddev);
}
ret = 0;
 err:
@@ -2288,9 +2288,9 @@ location_store(struct mddev *mddev, const char *buf, 
size_t len)
goto out;
}
if (mddev->pers) {
-   mddev->pers->quiesce(mddev, 1);
+   mddev_suspend(mddev);
md_bitmap_destroy(mddev);
-   mddev->pers->quiesce(mddev, 0);
+   mddev_resume(mddev);
}
mddev->bitmap_info.offset = 0;
if (mddev->bitmap_info.file) {
@@ -2327,8 +2327,8 @@ location_store(struct mddev *mddev, const char *buf, 
size_t len)
mddev->bitmap_info.offset = offset;
if (mddev->pers) {
struct bitmap *bitmap;
-   mddev->pers->quiesce(mddev, 1);
bitmap = md_bitmap_create(mddev, -1);
+   mddev_suspend(mddev);
if (IS_ERR(bitmap))
rv = PTR_ERR(bitmap);
else {
@@ -2337,11 +2337,12 @@ location_store(struct mddev *mddev, const char *buf, 
size_t len)
if (rv)
mddev->bitmap_info.offset = 0;
}
-   mddev->pers->quiesce(mddev, 0);
if (rv) {
md_bitmap_destroy(mddev);
+   mddev_resume(mddev);
goto out;
}
+   mddev_resume(mddev);
}
}
}
-- 
2.7.4



Re: [PATCH] KVM: VMX: fixes for vmentry_l1d_flush module parameter

2018-08-22 Thread Jack Wang
Paolo Bonzini  于2018年8月22日周三 下午4:56写道:
>
> Two bug fixes:
>
> 1) missing entries in the l1d_param array; this can cause a host crash
> if an access attempts to reach the missing entry. Future-proof the get
> function against any overflows as well.  However, the two entries
> VMENTER_L1D_FLUSH_EPT_DISABLED and VMENTER_L1D_FLUSH_NOT_REQUIRED must
> not be accepted by the parse function, so disable them there.
>
> 2) invalid values must be rejected even if the CPU does not have the
> bug, so test for them before checking boot_cpu_has(X86_BUG_L1TF)
>
> ... and a small refactoring, since the .cmd field is redundant with
> the index in the array.
>
> Reported-by: Bandan Das 
> Cc: sta...@vger.kernel.org
> Fixes: a7b9020b06ec6d7c3f3b0d4ef1a9eba12654f4f7
> Signed-off-by: Paolo Bonzini 
> ---
>  arch/x86/kvm/vmx.c | 26 --
>  1 file changed, 16 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c76ca8c4befa..8dae47e7267a 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -198,12 +198,14 @@
>
>  static const struct {
> const char *option;
> -   enum vmx_l1d_flush_state cmd;
> +   bool for_parse;
>  } vmentry_l1d_param[] = {
> -   {"auto",VMENTER_L1D_FLUSH_AUTO},
> -   {"never",   VMENTER_L1D_FLUSH_NEVER},
> -   {"cond",VMENTER_L1D_FLUSH_COND},
> -   {"always",  VMENTER_L1D_FLUSH_ALWAYS},
> +   [VMENTER_L1D_FLUSH_AUTO] = {"auto", true},
> +   [VMENTER_L1D_FLUSH_NEVER]= {"never", true},
> +   [VMENTER_L1D_FLUSH_COND] = {"cond", true},
> +   [VMENTER_L1D_FLUSH_ALWAYS]   = {"always", true},
> +   [VMENTER_L1D_FLUSH_EPT_DISABLED] = {"EPT disabled", false},
> +   [VMENTER_L1D_FLUSH_NOT_REQUIRED] = {"not required", false},
>  };
>
>  #define L1D_CACHE_ORDER 4
> @@ -287,8 +289,9 @@ static int vmentry_l1d_flush_parse(const char *s)
>
> if (s) {
> for (i = 0; i < ARRAY_SIZE(vmentry_l1d_param); i++) {
> -   if (sysfs_streq(s, vmentry_l1d_param[i].option))
> -   return vmentry_l1d_param[i].cmd;
> +   if (vmentry_l1d_param[i].for_parse &&
> +   sysfs_streq(s, vmentry_l1d_param[i].option))
> +   return i;
> }
> }
> return -EINVAL;
> @@ -298,13 +301,13 @@ static int vmentry_l1d_flush_set(const char *s, const 
> struct kernel_param *kp)
>  {
> int l1tf, ret;
>
> -   if (!boot_cpu_has(X86_BUG_L1TF))
> -   return 0;
> -
> l1tf = vmentry_l1d_flush_parse(s);
> if (l1tf < 0)
> return l1tf;
>
> +   if (!boot_cpu_has(X86_BUG_L1TF))
> +   return 0;
> +
> /*
>  * Has vmx_init() run already? If not then this is the pre init
>  * parameter parsing. In that case just store the value and let
> @@ -324,6 +327,9 @@ static int vmentry_l1d_flush_set(const char *s, const 
> struct kernel_param *kp)
>
>  static int vmentry_l1d_flush_get(char *s, const struct kernel_param *kp)
>  {
> +   if (WARN_ON_ONCE(l1tf_vmx_mitigation >= 
> ARRAY_SIZE(vmentry_l1d_param)))
> +   return sprintf(s, "???\n");
> +
> return sprintf(s, "%s\n", 
> vmentry_l1d_param[l1tf_vmx_mitigation].option);
>  }
>
> --
> 1.8.3.1
>
Tested-by: Jack Wang 
Thanks,

Jack


Re: [PATCH] KVM: VMX: fixes for vmentry_l1d_flush module parameter

2018-08-22 Thread Jack Wang
Paolo Bonzini  于2018年8月22日周三 下午4:56写道:
>
> Two bug fixes:
>
> 1) missing entries in the l1d_param array; this can cause a host crash
> if an access attempts to reach the missing entry. Future-proof the get
> function against any overflows as well.  However, the two entries
> VMENTER_L1D_FLUSH_EPT_DISABLED and VMENTER_L1D_FLUSH_NOT_REQUIRED must
> not be accepted by the parse function, so disable them there.
>
> 2) invalid values must be rejected even if the CPU does not have the
> bug, so test for them before checking boot_cpu_has(X86_BUG_L1TF)
>
> ... and a small refactoring, since the .cmd field is redundant with
> the index in the array.
>
> Reported-by: Bandan Das 
> Cc: sta...@vger.kernel.org
> Fixes: a7b9020b06ec6d7c3f3b0d4ef1a9eba12654f4f7
> Signed-off-by: Paolo Bonzini 
> ---
>  arch/x86/kvm/vmx.c | 26 --
>  1 file changed, 16 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c76ca8c4befa..8dae47e7267a 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -198,12 +198,14 @@
>
>  static const struct {
> const char *option;
> -   enum vmx_l1d_flush_state cmd;
> +   bool for_parse;
>  } vmentry_l1d_param[] = {
> -   {"auto",VMENTER_L1D_FLUSH_AUTO},
> -   {"never",   VMENTER_L1D_FLUSH_NEVER},
> -   {"cond",VMENTER_L1D_FLUSH_COND},
> -   {"always",  VMENTER_L1D_FLUSH_ALWAYS},
> +   [VMENTER_L1D_FLUSH_AUTO] = {"auto", true},
> +   [VMENTER_L1D_FLUSH_NEVER]= {"never", true},
> +   [VMENTER_L1D_FLUSH_COND] = {"cond", true},
> +   [VMENTER_L1D_FLUSH_ALWAYS]   = {"always", true},
> +   [VMENTER_L1D_FLUSH_EPT_DISABLED] = {"EPT disabled", false},
> +   [VMENTER_L1D_FLUSH_NOT_REQUIRED] = {"not required", false},
>  };
>
>  #define L1D_CACHE_ORDER 4
> @@ -287,8 +289,9 @@ static int vmentry_l1d_flush_parse(const char *s)
>
> if (s) {
> for (i = 0; i < ARRAY_SIZE(vmentry_l1d_param); i++) {
> -   if (sysfs_streq(s, vmentry_l1d_param[i].option))
> -   return vmentry_l1d_param[i].cmd;
> +   if (vmentry_l1d_param[i].for_parse &&
> +   sysfs_streq(s, vmentry_l1d_param[i].option))
> +   return i;
> }
> }
> return -EINVAL;
> @@ -298,13 +301,13 @@ static int vmentry_l1d_flush_set(const char *s, const 
> struct kernel_param *kp)
>  {
> int l1tf, ret;
>
> -   if (!boot_cpu_has(X86_BUG_L1TF))
> -   return 0;
> -
> l1tf = vmentry_l1d_flush_parse(s);
> if (l1tf < 0)
> return l1tf;
>
> +   if (!boot_cpu_has(X86_BUG_L1TF))
> +   return 0;
> +
> /*
>  * Has vmx_init() run already? If not then this is the pre init
>  * parameter parsing. In that case just store the value and let
> @@ -324,6 +327,9 @@ static int vmentry_l1d_flush_set(const char *s, const 
> struct kernel_param *kp)
>
>  static int vmentry_l1d_flush_get(char *s, const struct kernel_param *kp)
>  {
> +   if (WARN_ON_ONCE(l1tf_vmx_mitigation >= 
> ARRAY_SIZE(vmentry_l1d_param)))
> +   return sprintf(s, "???\n");
> +
> return sprintf(s, "%s\n", 
> vmentry_l1d_param[l1tf_vmx_mitigation].option);
>  }
>
> --
> 1.8.3.1
>
Tested-by: Jack Wang 
Thanks,

Jack


[PATCH v2] block: ratelimite pr_err on IO path

2018-04-12 Thread Jack Wang
From: Jack Wang <jinpu.w...@profitbricks.com>

This avoid soft lockup below:
[ 2328.328429] Call Trace:
[ 2328.328433]  vprintk_emit+0x229/0x2e0
[ 2328.328436]  ? t10_pi_type3_verify_ip+0x20/0x20
[ 2328.328437]  printk+0x52/0x6e
[ 2328.328439]  t10_pi_verify+0x9e/0xf0
[ 2328.328441]  bio_integrity_process+0x12e/0x220
[ 2328.328442]  ? t10_pi_type1_verify_crc+0x20/0x20
[ 2328.328443]  bio_integrity_verify_fn+0xde/0x140
[ 2328.328447]  process_one_work+0x13f/0x370
[ 2328.328449]  worker_thread+0x62/0x3d0
[ 2328.328450]  ? rescuer_thread+0x2f0/0x2f0
[ 2328.328452]  kthread+0x116/0x150
[ 2328.328454]  ? __kthread_parkme+0x70/0x70
[ 2328.328457]  ret_from_fork+0x35/0x40

Signed-off-by: Jack Wang <jinpu.w...@profitbricks.com>
---
v2: keep the message in same line as Robert and coding style suggested

 block/t10-pi.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/block/t10-pi.c b/block/t10-pi.c
index a98db38..6faf8c1 100644
--- a/block/t10-pi.c
+++ b/block/t10-pi.c
@@ -84,10 +84,11 @@ static blk_status_t t10_pi_verify(struct blk_integrity_iter 
*iter,
 
if (be32_to_cpu(pi->ref_tag) !=
lower_32_bits(iter->seed)) {
-   pr_err("%s: ref tag error at location %llu " \
-  "(rcvd %u)\n", iter->disk_name,
-  (unsigned long long)
-  iter->seed, be32_to_cpu(pi->ref_tag));
+   pr_err_ratelimited("%s: ref tag error at 
location %llu (rcvd %u)\n",
+  iter->disk_name,
+  (unsigned long long)
+  iter->seed,
+  be32_to_cpu(pi->ref_tag));
return BLK_STS_PROTECTION;
}
break;
@@ -101,10 +102,11 @@ static blk_status_t t10_pi_verify(struct 
blk_integrity_iter *iter,
csum = fn(iter->data_buf, iter->interval);
 
if (pi->guard_tag != csum) {
-   pr_err("%s: guard tag error at sector %llu " \
-  "(rcvd %04x, want %04x)\n", iter->disk_name,
-  (unsigned long long)iter->seed,
-  be16_to_cpu(pi->guard_tag), be16_to_cpu(csum));
+   pr_err_ratelimited("%s: guard tag error at sector %llu 
(rcvd %04x, want %04x)\n",
+  iter->disk_name,
+  (unsigned long long)iter->seed,
+  be16_to_cpu(pi->guard_tag),
+  be16_to_cpu(csum));
return BLK_STS_PROTECTION;
}
 
-- 
2.7.4



[PATCH v2] block: ratelimite pr_err on IO path

2018-04-12 Thread Jack Wang
From: Jack Wang 

This avoid soft lockup below:
[ 2328.328429] Call Trace:
[ 2328.328433]  vprintk_emit+0x229/0x2e0
[ 2328.328436]  ? t10_pi_type3_verify_ip+0x20/0x20
[ 2328.328437]  printk+0x52/0x6e
[ 2328.328439]  t10_pi_verify+0x9e/0xf0
[ 2328.328441]  bio_integrity_process+0x12e/0x220
[ 2328.328442]  ? t10_pi_type1_verify_crc+0x20/0x20
[ 2328.328443]  bio_integrity_verify_fn+0xde/0x140
[ 2328.328447]  process_one_work+0x13f/0x370
[ 2328.328449]  worker_thread+0x62/0x3d0
[ 2328.328450]  ? rescuer_thread+0x2f0/0x2f0
[ 2328.328452]  kthread+0x116/0x150
[ 2328.328454]  ? __kthread_parkme+0x70/0x70
[ 2328.328457]  ret_from_fork+0x35/0x40

Signed-off-by: Jack Wang 
---
v2: keep the message in same line as Robert and coding style suggested

 block/t10-pi.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/block/t10-pi.c b/block/t10-pi.c
index a98db38..6faf8c1 100644
--- a/block/t10-pi.c
+++ b/block/t10-pi.c
@@ -84,10 +84,11 @@ static blk_status_t t10_pi_verify(struct blk_integrity_iter 
*iter,
 
if (be32_to_cpu(pi->ref_tag) !=
lower_32_bits(iter->seed)) {
-   pr_err("%s: ref tag error at location %llu " \
-  "(rcvd %u)\n", iter->disk_name,
-  (unsigned long long)
-  iter->seed, be32_to_cpu(pi->ref_tag));
+   pr_err_ratelimited("%s: ref tag error at 
location %llu (rcvd %u)\n",
+  iter->disk_name,
+  (unsigned long long)
+  iter->seed,
+  be32_to_cpu(pi->ref_tag));
return BLK_STS_PROTECTION;
}
break;
@@ -101,10 +102,11 @@ static blk_status_t t10_pi_verify(struct 
blk_integrity_iter *iter,
csum = fn(iter->data_buf, iter->interval);
 
if (pi->guard_tag != csum) {
-   pr_err("%s: guard tag error at sector %llu " \
-  "(rcvd %04x, want %04x)\n", iter->disk_name,
-  (unsigned long long)iter->seed,
-  be16_to_cpu(pi->guard_tag), be16_to_cpu(csum));
+   pr_err_ratelimited("%s: guard tag error at sector %llu 
(rcvd %04x, want %04x)\n",
+  iter->disk_name,
+  (unsigned long long)iter->seed,
+  be16_to_cpu(pi->guard_tag),
+  be16_to_cpu(csum));
return BLK_STS_PROTECTION;
}
 
-- 
2.7.4



[PATCH] block: ratelimite pr_err on IO path

2018-04-11 Thread Jack Wang
From: Jack Wang <jinpu.w...@profitbricks.com>

This avoid soft lockup below:
[ 2328.328429] Call Trace:
[ 2328.328433]  vprintk_emit+0x229/0x2e0
[ 2328.328436]  ? t10_pi_type3_verify_ip+0x20/0x20
[ 2328.328437]  printk+0x52/0x6e
[ 2328.328439]  t10_pi_verify+0x9e/0xf0
[ 2328.328441]  bio_integrity_process+0x12e/0x220
[ 2328.328442]  ? t10_pi_type1_verify_crc+0x20/0x20
[ 2328.328443]  bio_integrity_verify_fn+0xde/0x140
[ 2328.328447]  process_one_work+0x13f/0x370
[ 2328.328449]  worker_thread+0x62/0x3d0
[ 2328.328450]  ? rescuer_thread+0x2f0/0x2f0
[ 2328.328452]  kthread+0x116/0x150
[ 2328.328454]  ? __kthread_parkme+0x70/0x70
[ 2328.328457]  ret_from_fork+0x35/0x40

Signed-off-by: Jack Wang <jinpu.w...@profitbricks.com>
---
 block/t10-pi.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/block/t10-pi.c b/block/t10-pi.c
index a98db38..35967d5 100644
--- a/block/t10-pi.c
+++ b/block/t10-pi.c
@@ -84,10 +84,12 @@ static blk_status_t t10_pi_verify(struct blk_integrity_iter 
*iter,
 
if (be32_to_cpu(pi->ref_tag) !=
lower_32_bits(iter->seed)) {
-   pr_err("%s: ref tag error at location %llu " \
-  "(rcvd %u)\n", iter->disk_name,
-  (unsigned long long)
-  iter->seed, be32_to_cpu(pi->ref_tag));
+   pr_err_ratelimited("%s: ref tag error at "
+  "location %llu (rcvd %u)\n",
+  iter->disk_name,
+  (unsigned long long)
+  iter->seed,
+  be32_to_cpu(pi->ref_tag));
return BLK_STS_PROTECTION;
}
break;
@@ -101,10 +103,12 @@ static blk_status_t t10_pi_verify(struct 
blk_integrity_iter *iter,
csum = fn(iter->data_buf, iter->interval);
 
if (pi->guard_tag != csum) {
-   pr_err("%s: guard tag error at sector %llu " \
-  "(rcvd %04x, want %04x)\n", iter->disk_name,
-  (unsigned long long)iter->seed,
-  be16_to_cpu(pi->guard_tag), be16_to_cpu(csum));
+   pr_err_ratelimited("%s: guard tag error at sector %llu "
+  "(rcvd %04x, want %04x)\n",
+  iter->disk_name,
+  (unsigned long long)iter->seed,
+  be16_to_cpu(pi->guard_tag),
+  be16_to_cpu(csum));
return BLK_STS_PROTECTION;
}
 
-- 
2.7.4



[PATCH] block: ratelimite pr_err on IO path

2018-04-11 Thread Jack Wang
From: Jack Wang 

This avoid soft lockup below:
[ 2328.328429] Call Trace:
[ 2328.328433]  vprintk_emit+0x229/0x2e0
[ 2328.328436]  ? t10_pi_type3_verify_ip+0x20/0x20
[ 2328.328437]  printk+0x52/0x6e
[ 2328.328439]  t10_pi_verify+0x9e/0xf0
[ 2328.328441]  bio_integrity_process+0x12e/0x220
[ 2328.328442]  ? t10_pi_type1_verify_crc+0x20/0x20
[ 2328.328443]  bio_integrity_verify_fn+0xde/0x140
[ 2328.328447]  process_one_work+0x13f/0x370
[ 2328.328449]  worker_thread+0x62/0x3d0
[ 2328.328450]  ? rescuer_thread+0x2f0/0x2f0
[ 2328.328452]  kthread+0x116/0x150
[ 2328.328454]  ? __kthread_parkme+0x70/0x70
[ 2328.328457]  ret_from_fork+0x35/0x40

Signed-off-by: Jack Wang 
---
 block/t10-pi.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/block/t10-pi.c b/block/t10-pi.c
index a98db38..35967d5 100644
--- a/block/t10-pi.c
+++ b/block/t10-pi.c
@@ -84,10 +84,12 @@ static blk_status_t t10_pi_verify(struct blk_integrity_iter 
*iter,
 
if (be32_to_cpu(pi->ref_tag) !=
lower_32_bits(iter->seed)) {
-   pr_err("%s: ref tag error at location %llu " \
-  "(rcvd %u)\n", iter->disk_name,
-  (unsigned long long)
-  iter->seed, be32_to_cpu(pi->ref_tag));
+   pr_err_ratelimited("%s: ref tag error at "
+  "location %llu (rcvd %u)\n",
+  iter->disk_name,
+  (unsigned long long)
+  iter->seed,
+  be32_to_cpu(pi->ref_tag));
return BLK_STS_PROTECTION;
}
break;
@@ -101,10 +103,12 @@ static blk_status_t t10_pi_verify(struct 
blk_integrity_iter *iter,
csum = fn(iter->data_buf, iter->interval);
 
if (pi->guard_tag != csum) {
-   pr_err("%s: guard tag error at sector %llu " \
-  "(rcvd %04x, want %04x)\n", iter->disk_name,
-  (unsigned long long)iter->seed,
-  be16_to_cpu(pi->guard_tag), be16_to_cpu(csum));
+   pr_err_ratelimited("%s: guard tag error at sector %llu "
+  "(rcvd %04x, want %04x)\n",
+  iter->disk_name,
+  (unsigned long long)iter->seed,
+  be16_to_cpu(pi->guard_tag),
+  be16_to_cpu(csum));
return BLK_STS_PROTECTION;
}
 
-- 
2.7.4



Re: [PATCH v4 05/11] libsas: Use dynamic alloced work to avoid sas event lost

2017-09-07 Thread Jack Wang
PORTE_BYTES_DMAED, >port_events_pending);
> -
> sas_form_port(phy);
>  }
>
> @@ -273,8 +271,6 @@ void sas_porte_broadcast_rcvd(struct work_struct *work)
> unsigned long flags;
> u32 prim;
>
> -   clear_bit(PORTE_BROADCAST_RCVD, >port_events_pending);
> -
> spin_lock_irqsave(>sas_prim_lock, flags);
> prim = phy->sas_prim;
> spin_unlock_irqrestore(>sas_prim_lock, flags);
> @@ -288,8 +284,6 @@ void sas_porte_link_reset_err(struct work_struct *work)
> struct asd_sas_event *ev = to_asd_sas_event(work);
> struct asd_sas_phy *phy = ev->phy;
>
> -   clear_bit(PORTE_LINK_RESET_ERR, >port_events_pending);
> -
> sas_deform_port(phy, 1);
>  }
>
> @@ -298,8 +292,6 @@ void sas_porte_timer_event(struct work_struct *work)
> struct asd_sas_event *ev = to_asd_sas_event(work);
> struct asd_sas_phy *phy = ev->phy;
>
> -   clear_bit(PORTE_TIMER_EVENT, >port_events_pending);
> -
> sas_deform_port(phy, 1);
>  }
>
> @@ -308,8 +300,6 @@ void sas_porte_hard_reset(struct work_struct *work)
> struct asd_sas_event *ev = to_asd_sas_event(work);
> struct asd_sas_phy *phy = ev->phy;
>
> -   clear_bit(PORTE_HARD_RESET, >port_events_pending);
> -
> sas_deform_port(phy, 1);
>  }
>
> @@ -353,3 +343,11 @@ void sas_unregister_ports(struct sas_ha_struct *sas_ha)
> sas_deform_port(sas_ha->sas_phy[i], 0);
>
>  }
> +
> +const work_func_t sas_port_event_fns[PORT_NUM_EVENTS] = {
> +   [PORTE_BYTES_DMAED] = sas_porte_bytes_dmaed,
> +   [PORTE_BROADCAST_RCVD] = sas_porte_broadcast_rcvd,
> +   [PORTE_LINK_RESET_ERR] = sas_porte_link_reset_err,
> +   [PORTE_TIMER_EVENT] = sas_porte_timer_event,
> +   [PORTE_HARD_RESET] = sas_porte_hard_reset,
> +};
> diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
> index 99f32b5..c80321b 100644
> --- a/include/scsi/libsas.h
> +++ b/include/scsi/libsas.h
> @@ -292,6 +292,7 @@ struct asd_sas_port {
>  struct asd_sas_event {
> struct sas_work work;
> struct asd_sas_phy *phy;
> +   int event;
>  };
>
>  static inline struct asd_sas_event *to_asd_sas_event(struct work_struct 
> *work)
> @@ -301,17 +302,20 @@ static inline struct asd_sas_event 
> *to_asd_sas_event(struct work_struct *work)
> return ev;
>  }
>
> +static inline void INIT_SAS_EVENT(struct asd_sas_event *ev, void 
> (*fn)(struct work_struct *),
> +   struct asd_sas_phy *phy, int event)
> +{
> +   INIT_SAS_WORK(>work, fn);
> +   ev->phy = phy;
> +   ev->event = event;
> +}
> +
> +
>  /* The phy pretty much is controlled by the LLDD.
>   * The class only reads those fields.
>   */
>  struct asd_sas_phy {
>  /* private: */
> -   struct asd_sas_event   port_events[PORT_NUM_EVENTS];
> -   struct asd_sas_event   phy_events[PHY_NUM_EVENTS];
> -
> -   unsigned long port_events_pending;
> -   unsigned long phy_events_pending;
> -
> int error;
> int suspended;
>
> --
> 2.5.0
>
Looks good, thanks!
Reviewed-by: Jack Wang <jinpu.w...@profitbricks.com>


Re: [PATCH v4 05/11] libsas: Use dynamic alloced work to avoid sas event lost

2017-09-07 Thread Jack Wang
  clear_bit(PORTE_BROADCAST_RCVD, >port_events_pending);
> -
> spin_lock_irqsave(>sas_prim_lock, flags);
> prim = phy->sas_prim;
> spin_unlock_irqrestore(>sas_prim_lock, flags);
> @@ -288,8 +284,6 @@ void sas_porte_link_reset_err(struct work_struct *work)
> struct asd_sas_event *ev = to_asd_sas_event(work);
> struct asd_sas_phy *phy = ev->phy;
>
> -   clear_bit(PORTE_LINK_RESET_ERR, >port_events_pending);
> -
> sas_deform_port(phy, 1);
>  }
>
> @@ -298,8 +292,6 @@ void sas_porte_timer_event(struct work_struct *work)
> struct asd_sas_event *ev = to_asd_sas_event(work);
> struct asd_sas_phy *phy = ev->phy;
>
> -   clear_bit(PORTE_TIMER_EVENT, >port_events_pending);
> -
> sas_deform_port(phy, 1);
>  }
>
> @@ -308,8 +300,6 @@ void sas_porte_hard_reset(struct work_struct *work)
> struct asd_sas_event *ev = to_asd_sas_event(work);
> struct asd_sas_phy *phy = ev->phy;
>
> -   clear_bit(PORTE_HARD_RESET, >port_events_pending);
> -
> sas_deform_port(phy, 1);
>  }
>
> @@ -353,3 +343,11 @@ void sas_unregister_ports(struct sas_ha_struct *sas_ha)
> sas_deform_port(sas_ha->sas_phy[i], 0);
>
>  }
> +
> +const work_func_t sas_port_event_fns[PORT_NUM_EVENTS] = {
> +   [PORTE_BYTES_DMAED] = sas_porte_bytes_dmaed,
> +   [PORTE_BROADCAST_RCVD] = sas_porte_broadcast_rcvd,
> +   [PORTE_LINK_RESET_ERR] = sas_porte_link_reset_err,
> +   [PORTE_TIMER_EVENT] = sas_porte_timer_event,
> +   [PORTE_HARD_RESET] = sas_porte_hard_reset,
> +};
> diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
> index 99f32b5..c80321b 100644
> --- a/include/scsi/libsas.h
> +++ b/include/scsi/libsas.h
> @@ -292,6 +292,7 @@ struct asd_sas_port {
>  struct asd_sas_event {
> struct sas_work work;
> struct asd_sas_phy *phy;
> +   int event;
>  };
>
>  static inline struct asd_sas_event *to_asd_sas_event(struct work_struct 
> *work)
> @@ -301,17 +302,20 @@ static inline struct asd_sas_event 
> *to_asd_sas_event(struct work_struct *work)
> return ev;
>  }
>
> +static inline void INIT_SAS_EVENT(struct asd_sas_event *ev, void 
> (*fn)(struct work_struct *),
> +   struct asd_sas_phy *phy, int event)
> +{
> +   INIT_SAS_WORK(>work, fn);
> +   ev->phy = phy;
> +   ev->event = event;
> +}
> +
> +
>  /* The phy pretty much is controlled by the LLDD.
>   * The class only reads those fields.
>   */
>  struct asd_sas_phy {
>  /* private: */
> -   struct asd_sas_event   port_events[PORT_NUM_EVENTS];
> -   struct asd_sas_event   phy_events[PHY_NUM_EVENTS];
> -
> -   unsigned long port_events_pending;
> -   unsigned long phy_events_pending;
> -
> int error;
> int suspended;
>
> --
> 2.5.0
>
Looks good, thanks!
Reviewed-by: Jack Wang 


Re: [PATCH v4 07/11] libsas: make the event threshold configurable

2017-09-07 Thread Jack Wang
2017-09-06 11:15 GMT+02:00 Jason Yan <yanai...@huawei.com>:
> Add a sysfs attr that LLDD can configure it for every host. We made
> a example in hisi_sas. Other LLDDs using libsas can implement it if
> they want.
>
> Suggested-by: Hannes Reinecke <h...@suse.com>
> Signed-off-by: Jason Yan <yanai...@huawei.com>
> CC: John Garry <john.ga...@huawei.com>
> CC: Johannes Thumshirn <jthumsh...@suse.de>
> CC: Ewan Milne <emi...@redhat.com>
> CC: Christoph Hellwig <h...@lst.de>
> CC: Tomas Henzl <the...@redhat.com>
> CC: Dan Williams <dan.j.willi...@intel.com>
> ---
>  drivers/scsi/hisi_sas/hisi_sas_main.c |  6 ++
>  drivers/scsi/libsas/sas_init.c| 27 +++
>  include/scsi/libsas.h |  1 +
>  3 files changed, 34 insertions(+)
>
> diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c 
> b/drivers/scsi/hisi_sas/hisi_sas_main.c
> index 5e47abb..9eceed1 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
> +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
> @@ -1470,6 +1470,11 @@ EXPORT_SYMBOL_GPL(hisi_sas_rescan_topology);
>  struct scsi_transport_template *hisi_sas_stt;
>  EXPORT_SYMBOL_GPL(hisi_sas_stt);
>
> +struct device_attribute *host_attrs[] = {
> +_attr_phy_event_threshold,
> +NULL,
> +};
> +
>  static struct scsi_host_template _hisi_sas_sht = {
> .module = THIS_MODULE,
> .name   = DRV_NAME,
> @@ -1489,6 +1494,7 @@ static struct scsi_host_template _hisi_sas_sht = {
> .eh_bus_reset_handler   = sas_eh_bus_reset_handler,
> .target_destroy = sas_target_destroy,
> .ioctl  = sas_ioctl,
> +   .shost_attrs= host_attrs,
>  };
>  struct scsi_host_template *hisi_sas_sht = &_hisi_sas_sht;
>  EXPORT_SYMBOL_GPL(hisi_sas_sht);
> diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
> index b1e03d5..e2d98a8 100644
> --- a/drivers/scsi/libsas/sas_init.c
> +++ b/drivers/scsi/libsas/sas_init.c
> @@ -537,6 +537,33 @@ static struct sas_function_template sft = {
> .smp_handler = sas_smp_handler,
>  };
>
> +static inline ssize_t phy_event_threshold_show(struct device *dev,
> +   struct device_attribute *attr, char *buf)
> +{
> +   struct Scsi_Host *shost = class_to_shost(dev);
> +   struct sas_ha_struct *sha = SHOST_TO_SAS_HA(shost);
> +
> +   return scnprintf(buf, PAGE_SIZE, "%u\n", sha->event_thres);
> +}
> +
> +static inline ssize_t phy_event_threshold_store(struct device *dev,
> +   struct device_attribute *attr,
> +   const char *buf, size_t count)
> +{
> +   struct Scsi_Host *shost = class_to_shost(dev);
> +   struct sas_ha_struct *sha = SHOST_TO_SAS_HA(shost);
> +
> +   sha->event_thres = simple_strtol(buf, NULL, 10);
> +
> +   return count;
> +}
> +
> +DEVICE_ATTR(phy_event_threshold,
> + S_IRUGO|S_IWUSR,
> + phy_event_threshold_show,
> + phy_event_threshold_store);
> +EXPORT_SYMBOL_GPL(dev_attr_phy_event_threshold);
> +
>  struct scsi_transport_template *
>  sas_domain_attach_transport(struct sas_domain_function_template *dft)
>  {
> diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
> index 2fa0b13..08c1c32 100644
> --- a/include/scsi/libsas.h
> +++ b/include/scsi/libsas.h
> @@ -679,6 +679,7 @@ extern int sas_bios_param(struct scsi_device *,
>   sector_t capacity, int *hsc);
>  extern struct scsi_transport_template *
>  sas_domain_attach_transport(struct sas_domain_function_template *);
> +extern struct device_attribute dev_attr_phy_event_threshold;
>
>  int  sas_discover_root_expander(struct domain_device *);
>
> --
> 2.5.0
>

Looks good, thanks!
Reviewed-by: Jack Wang <jinpu.w...@profitbricks.com>


Re: [PATCH v4 07/11] libsas: make the event threshold configurable

2017-09-07 Thread Jack Wang
2017-09-06 11:15 GMT+02:00 Jason Yan :
> Add a sysfs attr that LLDD can configure it for every host. We made
> a example in hisi_sas. Other LLDDs using libsas can implement it if
> they want.
>
> Suggested-by: Hannes Reinecke 
> Signed-off-by: Jason Yan 
> CC: John Garry 
> CC: Johannes Thumshirn 
> CC: Ewan Milne 
> CC: Christoph Hellwig 
> CC: Tomas Henzl 
> CC: Dan Williams 
> ---
>  drivers/scsi/hisi_sas/hisi_sas_main.c |  6 ++
>  drivers/scsi/libsas/sas_init.c| 27 +++
>  include/scsi/libsas.h |  1 +
>  3 files changed, 34 insertions(+)
>
> diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c 
> b/drivers/scsi/hisi_sas/hisi_sas_main.c
> index 5e47abb..9eceed1 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
> +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
> @@ -1470,6 +1470,11 @@ EXPORT_SYMBOL_GPL(hisi_sas_rescan_topology);
>  struct scsi_transport_template *hisi_sas_stt;
>  EXPORT_SYMBOL_GPL(hisi_sas_stt);
>
> +struct device_attribute *host_attrs[] = {
> +_attr_phy_event_threshold,
> +NULL,
> +};
> +
>  static struct scsi_host_template _hisi_sas_sht = {
> .module = THIS_MODULE,
> .name   = DRV_NAME,
> @@ -1489,6 +1494,7 @@ static struct scsi_host_template _hisi_sas_sht = {
> .eh_bus_reset_handler   = sas_eh_bus_reset_handler,
> .target_destroy = sas_target_destroy,
> .ioctl  = sas_ioctl,
> +   .shost_attrs= host_attrs,
>  };
>  struct scsi_host_template *hisi_sas_sht = &_hisi_sas_sht;
>  EXPORT_SYMBOL_GPL(hisi_sas_sht);
> diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
> index b1e03d5..e2d98a8 100644
> --- a/drivers/scsi/libsas/sas_init.c
> +++ b/drivers/scsi/libsas/sas_init.c
> @@ -537,6 +537,33 @@ static struct sas_function_template sft = {
> .smp_handler = sas_smp_handler,
>  };
>
> +static inline ssize_t phy_event_threshold_show(struct device *dev,
> +   struct device_attribute *attr, char *buf)
> +{
> +   struct Scsi_Host *shost = class_to_shost(dev);
> +   struct sas_ha_struct *sha = SHOST_TO_SAS_HA(shost);
> +
> +   return scnprintf(buf, PAGE_SIZE, "%u\n", sha->event_thres);
> +}
> +
> +static inline ssize_t phy_event_threshold_store(struct device *dev,
> +   struct device_attribute *attr,
> +   const char *buf, size_t count)
> +{
> +   struct Scsi_Host *shost = class_to_shost(dev);
> +   struct sas_ha_struct *sha = SHOST_TO_SAS_HA(shost);
> +
> +   sha->event_thres = simple_strtol(buf, NULL, 10);
> +
> +   return count;
> +}
> +
> +DEVICE_ATTR(phy_event_threshold,
> + S_IRUGO|S_IWUSR,
> + phy_event_threshold_show,
> + phy_event_threshold_store);
> +EXPORT_SYMBOL_GPL(dev_attr_phy_event_threshold);
> +
>  struct scsi_transport_template *
>  sas_domain_attach_transport(struct sas_domain_function_template *dft)
>  {
> diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
> index 2fa0b13..08c1c32 100644
> --- a/include/scsi/libsas.h
> +++ b/include/scsi/libsas.h
> @@ -679,6 +679,7 @@ extern int sas_bios_param(struct scsi_device *,
>   sector_t capacity, int *hsc);
>  extern struct scsi_transport_template *
>  sas_domain_attach_transport(struct sas_domain_function_template *);
> +extern struct device_attribute dev_attr_phy_event_threshold;
>
>  int  sas_discover_root_expander(struct domain_device *);
>
> --
> 2.5.0
>

Looks good, thanks!
Reviewed-by: Jack Wang 


Re: Getting "Wrong diagnostic page; asked for 7 got 0" error message on HBA's virtual SES device

2017-03-15 Thread Jack Wang
2017-03-13 7:43 GMT+01:00 Sreekanth Reddy <sreekanth.re...@broadcom.com>:
> Hi,
>
> Our LSI(Broadcom) SAS3.5 HBA device's support virtual SES device.
>
> Whenever we load the mpt3sas driver then we are observing below error message,
>
> "Wrong diagnostic page; asked for 7 got 0"
>
> Our virtual SES device doesn't support Diagnostic page 7, it supports
> only below diagnostic pages,
>
> • 00h List of Supported Diagnostic Pages
> • 01h SES Configuration Diagnostic Page
> • 02h SES Enclosure Control/Enclosure Status Diagnostic Page
> • 0Ah SES Additional Element Status Diagnostic Page
>
> Page Code 07 is Element Descriptor Diagnostic Page
> Per SES3 spec (see table 10) this page is optional and not supported
> by our internal VSES.
>
> But 'ses' kernel module is issuing a RECEIVE_DIAGNOSTIC command for
> diagnostic page 7 without checking whether target device supports this
> page or not (i.e. without looking into 00h page) as shown below,
>
> result = ses_recv_diag(sdev, 7, hdr_buf, INIT_ALLOC_SIZE);
> if (result)
> goto simple_populate;
>
> As the virtual SES devices doesn't support this page 7, so it has
> failed this RECEIVE_DIAGNOSTIC command with illegal request sense key
> as shown below,
>
> ses 11:0:0:0: tag#0 Send: scmd 0x883fee898000
> ses 11:0:0:0: tag#0 CDB: Receive Diagnostic 1c 01 07 00 20 00
> ses 11:0:0:0: tag#0 CDB: Receive Diagnostic 1c 01 07 00 20 00
> mpt3sas_cm1:sas_address(0x510600b012345600), phy(16)
> mpt3sas_cm1:enclosure_logical_id(0x500605b012345600),slot(16)
> mpt3sas_cm1:enclosure level(0x), connector name( )
> mpt3sas_cm1:handle(0x0011), ioc_status(success)(0x), smid(1)
> mpt3sas_cm1:request_len(32), underflow(0), resid(32)
> mpt3sas_cm1:tag(0), transfer_count(0), sc->result(0x0002)
> mpt3sas_cm1:scsi_status(check condition)(0x02),
> scsi_state(autosense valid )(0x01)
> mpt3sas_cm1:[sense_key,asc,ascq]: [0x05,0x26,0x00], count(18)
> mpt3sas_cm1: log_info(0x31200205): originator(PL), code(0x20), 
> sub_code(0x0205)
> ses 11:0:0:0: tag#0 Done: SUCCESS Result: hostbyte=DID_TARGET_FAILURE
> driverbyte=DRIVER_OK
> ses 11:0:0:0: tag#0 CDB: Receive Diagnostic 1c 01 07 00 20 00
> ses 11:0:0:0: tag#0 Sense Key : Illegal Request [current]
> ses 11:0:0:0: tag#0 Add. Sense: Invalid field in parameter list
> ses 11:0:0:0: tag#0 scsi host busy 1 failed 0
> ses 11:0:0:0: Notifying upper driver of completion (result 812)
> ses 11:0:0:0: tag#0 0 sectors total, 32 bytes done.
> ses 11:0:0:0: Wrong diagnostic page; asked for 7 got 0
>
> As the command is failed with illegal request sense key, so the buffer
> used In function 'ses_recv_diag' will be not updated and so it will
> have all zero's. so the page_code will be zero and we observe below
> wrong error message even though vSES device has failed the command
> with illegal request error message and it has not returned any wrong
> diagnostic page.
>
> sdev_printk(KERN_ERR, sdev,
> "Wrong diagnostic page; asked for %d got %u\n",
> page_code, recv_page_code);
>
> Thanks,
> Sreekanth

Hi Sreekanth,

To me, we should fix ses module to check  00h List of Supported
Diagnostic Pages first before ask for 07h.
Add James in CC.

Cheers,
Jack Wang


Re: Getting "Wrong diagnostic page; asked for 7 got 0" error message on HBA's virtual SES device

2017-03-15 Thread Jack Wang
2017-03-13 7:43 GMT+01:00 Sreekanth Reddy :
> Hi,
>
> Our LSI(Broadcom) SAS3.5 HBA device's support virtual SES device.
>
> Whenever we load the mpt3sas driver then we are observing below error message,
>
> "Wrong diagnostic page; asked for 7 got 0"
>
> Our virtual SES device doesn't support Diagnostic page 7, it supports
> only below diagnostic pages,
>
> • 00h List of Supported Diagnostic Pages
> • 01h SES Configuration Diagnostic Page
> • 02h SES Enclosure Control/Enclosure Status Diagnostic Page
> • 0Ah SES Additional Element Status Diagnostic Page
>
> Page Code 07 is Element Descriptor Diagnostic Page
> Per SES3 spec (see table 10) this page is optional and not supported
> by our internal VSES.
>
> But 'ses' kernel module is issuing a RECEIVE_DIAGNOSTIC command for
> diagnostic page 7 without checking whether target device supports this
> page or not (i.e. without looking into 00h page) as shown below,
>
> result = ses_recv_diag(sdev, 7, hdr_buf, INIT_ALLOC_SIZE);
> if (result)
> goto simple_populate;
>
> As the virtual SES devices doesn't support this page 7, so it has
> failed this RECEIVE_DIAGNOSTIC command with illegal request sense key
> as shown below,
>
> ses 11:0:0:0: tag#0 Send: scmd 0x883fee898000
> ses 11:0:0:0: tag#0 CDB: Receive Diagnostic 1c 01 07 00 20 00
> ses 11:0:0:0: tag#0 CDB: Receive Diagnostic 1c 01 07 00 20 00
> mpt3sas_cm1:sas_address(0x510600b012345600), phy(16)
> mpt3sas_cm1:enclosure_logical_id(0x500605b012345600),slot(16)
> mpt3sas_cm1:enclosure level(0x), connector name( )
> mpt3sas_cm1:handle(0x0011), ioc_status(success)(0x), smid(1)
> mpt3sas_cm1:request_len(32), underflow(0), resid(32)
> mpt3sas_cm1:tag(0), transfer_count(0), sc->result(0x0002)
> mpt3sas_cm1:scsi_status(check condition)(0x02),
> scsi_state(autosense valid )(0x01)
> mpt3sas_cm1:[sense_key,asc,ascq]: [0x05,0x26,0x00], count(18)
> mpt3sas_cm1: log_info(0x31200205): originator(PL), code(0x20), 
> sub_code(0x0205)
> ses 11:0:0:0: tag#0 Done: SUCCESS Result: hostbyte=DID_TARGET_FAILURE
> driverbyte=DRIVER_OK
> ses 11:0:0:0: tag#0 CDB: Receive Diagnostic 1c 01 07 00 20 00
> ses 11:0:0:0: tag#0 Sense Key : Illegal Request [current]
> ses 11:0:0:0: tag#0 Add. Sense: Invalid field in parameter list
> ses 11:0:0:0: tag#0 scsi host busy 1 failed 0
> ses 11:0:0:0: Notifying upper driver of completion (result 812)
> ses 11:0:0:0: tag#0 0 sectors total, 32 bytes done.
> ses 11:0:0:0: Wrong diagnostic page; asked for 7 got 0
>
> As the command is failed with illegal request sense key, so the buffer
> used In function 'ses_recv_diag' will be not updated and so it will
> have all zero's. so the page_code will be zero and we observe below
> wrong error message even though vSES device has failed the command
> with illegal request error message and it has not returned any wrong
> diagnostic page.
>
> sdev_printk(KERN_ERR, sdev,
> "Wrong diagnostic page; asked for %d got %u\n",
>     page_code, recv_page_code);
>
> Thanks,
> Sreekanth

Hi Sreekanth,

To me, we should fix ses module to check  00h List of Supported
Diagnostic Pages first before ask for 07h.
Add James in CC.

Cheers,
Jack Wang


Re: [PATCH v2] blk: improve order of bio handling in generic_make_request()

2017-03-10 Thread Jack Wang


On 10.03.2017 15:55, Mikulas Patocka wrote:
> 
> 
> On Fri, 10 Mar 2017, Mike Snitzer wrote:
> 
>> On Fri, Mar 10 2017 at  7:34am -0500,
>> Lars Ellenberg  wrote:
>>
 --- a/block/blk-core.c
 +++ b/block/blk-core.c
 @@ -1975,7 +1975,14 @@ generic_make_request_checks(struct bio *bio)
   */
  blk_qc_t generic_make_request(struct bio *bio)
  {
 -   struct bio_list bio_list_on_stack;
 +   /*
 +* bio_list_on_stack[0] contains bios submitted by the current
 +* make_request_fn.
 +* bio_list_on_stack[1] contains bios that were submitted before
 +* the current make_request_fn, but that haven't been processed
 +* yet.
 +*/
 +   struct bio_list bio_list_on_stack[2];
 blk_qc_t ret = BLK_QC_T_NONE;
>>>
>>> May I suggest that, if you intend to assign something that is not a
>>> plain &(struct bio_list), but a &(struct bio_list[2]),
>>> you change the task member so it is renamed (current->bio_list vs
>>> current->bio_lists, plural, is what I did last year).
>>> Or you will break external modules, silently, and horribly (or,
>>> rather, they won't notice, but break the kernel).
>>> Examples of such modules would be DRBD, ZFS, quite possibly others.
>>
>> drbd is upstream -- so what is the problem?  (if you are having to
>> distribute drbd independent of the upstream drbd then why is drbd
>> upstream?)
>>
>> As for ZFS, worrying about ZFS kABI breakage is the last thing we should
>> be doing.
> 
> It's better to make external modules not compile than to silently 
> introduce bugs in them. So yes, I would rename that.
> 
> Mikulas

Agree, better rename current->bio_list to current->bio_lists

Regards,
Jack


Re: [PATCH v2] blk: improve order of bio handling in generic_make_request()

2017-03-10 Thread Jack Wang


On 10.03.2017 15:55, Mikulas Patocka wrote:
> 
> 
> On Fri, 10 Mar 2017, Mike Snitzer wrote:
> 
>> On Fri, Mar 10 2017 at  7:34am -0500,
>> Lars Ellenberg  wrote:
>>
 --- a/block/blk-core.c
 +++ b/block/blk-core.c
 @@ -1975,7 +1975,14 @@ generic_make_request_checks(struct bio *bio)
   */
  blk_qc_t generic_make_request(struct bio *bio)
  {
 -   struct bio_list bio_list_on_stack;
 +   /*
 +* bio_list_on_stack[0] contains bios submitted by the current
 +* make_request_fn.
 +* bio_list_on_stack[1] contains bios that were submitted before
 +* the current make_request_fn, but that haven't been processed
 +* yet.
 +*/
 +   struct bio_list bio_list_on_stack[2];
 blk_qc_t ret = BLK_QC_T_NONE;
>>>
>>> May I suggest that, if you intend to assign something that is not a
>>> plain &(struct bio_list), but a &(struct bio_list[2]),
>>> you change the task member so it is renamed (current->bio_list vs
>>> current->bio_lists, plural, is what I did last year).
>>> Or you will break external modules, silently, and horribly (or,
>>> rather, they won't notice, but break the kernel).
>>> Examples of such modules would be DRBD, ZFS, quite possibly others.
>>
>> drbd is upstream -- so what is the problem?  (if you are having to
>> distribute drbd independent of the upstream drbd then why is drbd
>> upstream?)
>>
>> As for ZFS, worrying about ZFS kABI breakage is the last thing we should
>> be doing.
> 
> It's better to make external modules not compile than to silently 
> introduce bugs in them. So yes, I would rename that.
> 
> Mikulas

Agree, better rename current->bio_list to current->bio_lists

Regards,
Jack


Re: [PATCH] blk: improve order of bio handling in generic_make_request()

2017-03-07 Thread Jack Wang


On 07.03.2017 16:46, Pavel Machek wrote:
> On Mon 2017-03-06 10:43:59, Jack Wang wrote:
>>
>>
>> On 06.03.2017 05:40, NeilBrown wrote:
>>> On Fri, Mar 03 2017, Jack Wang wrote:
>>>>
>>>> Thanks Neil for pushing the fix.
>>>>
>>>> We can optimize generic_make_request a little bit:
>>>> - assign bio_list struct hold directly instead init and merge
>>>> - remove duplicate code
>>>>
>>>> I think better to squash into your fix.
>>>
>>> Hi Jack,
>>>  I don't object to your changes, but I'd like to see a response from
>>>  Jens first.
>>>  My preference would be to get the original patch in, then other changes
>>>  that build on it, such as this one, can be added.  Until the core
>>>  changes lands, any other work is pointless.
>>>
>>>  Of course if Jens wants a this merged before he'll apply it, I'll
>>>  happily do that.
>>>
>>> Thanks,
>>> NeilBrown
>>
>> Hi Neil,
>>
>> Totally agree, let's wait for Jens's decision.
>>
>> Hi Jens,
>>
>> Please consider this fix also for stable 4.3+ 
> 
> Stable? We don't put this into stable, with exception of minimal fixes
> for real bugs...
>   Pavel
> 
It indeed fixes deadlock in RAID1 case.
Please follow the link I replied to Jens regarding the test.
It did hit us in our production.

Thanks,
Jack


Re: [PATCH] blk: improve order of bio handling in generic_make_request()

2017-03-07 Thread Jack Wang


On 07.03.2017 16:46, Pavel Machek wrote:
> On Mon 2017-03-06 10:43:59, Jack Wang wrote:
>>
>>
>> On 06.03.2017 05:40, NeilBrown wrote:
>>> On Fri, Mar 03 2017, Jack Wang wrote:
>>>>
>>>> Thanks Neil for pushing the fix.
>>>>
>>>> We can optimize generic_make_request a little bit:
>>>> - assign bio_list struct hold directly instead init and merge
>>>> - remove duplicate code
>>>>
>>>> I think better to squash into your fix.
>>>
>>> Hi Jack,
>>>  I don't object to your changes, but I'd like to see a response from
>>>  Jens first.
>>>  My preference would be to get the original patch in, then other changes
>>>  that build on it, such as this one, can be added.  Until the core
>>>  changes lands, any other work is pointless.
>>>
>>>  Of course if Jens wants a this merged before he'll apply it, I'll
>>>  happily do that.
>>>
>>> Thanks,
>>> NeilBrown
>>
>> Hi Neil,
>>
>> Totally agree, let's wait for Jens's decision.
>>
>> Hi Jens,
>>
>> Please consider this fix also for stable 4.3+ 
> 
> Stable? We don't put this into stable, with exception of minimal fixes
> for real bugs...
>   Pavel
> 
It indeed fixes deadlock in RAID1 case.
Please follow the link I replied to Jens regarding the test.
It did hit us in our production.

Thanks,
Jack


Re: [PATCH] blk: improve order of bio handling in generic_make_request()

2017-03-07 Thread Jack Wang


On 06.03.2017 21:18, Jens Axboe wrote:
> On 03/05/2017 09:40 PM, NeilBrown wrote:
>> On Fri, Mar 03 2017, Jack Wang wrote:
>>>
>>> Thanks Neil for pushing the fix.
>>>
>>> We can optimize generic_make_request a little bit:
>>> - assign bio_list struct hold directly instead init and merge
>>> - remove duplicate code
>>>
>>> I think better to squash into your fix.
>>
>> Hi Jack,
>>  I don't object to your changes, but I'd like to see a response from
>>  Jens first.
>>  My preference would be to get the original patch in, then other changes
>>  that build on it, such as this one, can be added.  Until the core
>>  changes lands, any other work is pointless.
>>
>>  Of course if Jens wants a this merged before he'll apply it, I'll
>>  happily do that.
> 
> I like the change, and thanks for tackling this. It's been a pending
> issue for way too long. I do think we should squash Jack's patch
> into the original, as it does clean up the code nicely.
> 
> Do we have a proper test case for this, so we can verify that it
> does indeed also work in practice?
> 
Hi Jens,

I can trigger deadlock with in RAID1 with test below:

I create one md with one local loop device and one remote scsi
exported by SRP. running fio with mix rw on top of md, force_close
session on storage side. mdx_raid1 is wait on free_array in D state,
and a lot of fio also in D state in wait_barrier.

With the patch from Neil above, I can no longer trigger it anymore.

The discussion was in link below:
http://www.spinics.net/lists/raid/msg54680.html

Thanks,
Jack Wang


Re: [PATCH] blk: improve order of bio handling in generic_make_request()

2017-03-07 Thread Jack Wang


On 06.03.2017 21:18, Jens Axboe wrote:
> On 03/05/2017 09:40 PM, NeilBrown wrote:
>> On Fri, Mar 03 2017, Jack Wang wrote:
>>>
>>> Thanks Neil for pushing the fix.
>>>
>>> We can optimize generic_make_request a little bit:
>>> - assign bio_list struct hold directly instead init and merge
>>> - remove duplicate code
>>>
>>> I think better to squash into your fix.
>>
>> Hi Jack,
>>  I don't object to your changes, but I'd like to see a response from
>>  Jens first.
>>  My preference would be to get the original patch in, then other changes
>>  that build on it, such as this one, can be added.  Until the core
>>  changes lands, any other work is pointless.
>>
>>  Of course if Jens wants a this merged before he'll apply it, I'll
>>  happily do that.
> 
> I like the change, and thanks for tackling this. It's been a pending
> issue for way too long. I do think we should squash Jack's patch
> into the original, as it does clean up the code nicely.
> 
> Do we have a proper test case for this, so we can verify that it
> does indeed also work in practice?
> 
Hi Jens,

I can trigger deadlock with in RAID1 with test below:

I create one md with one local loop device and one remote scsi
exported by SRP. running fio with mix rw on top of md, force_close
session on storage side. mdx_raid1 is wait on free_array in D state,
and a lot of fio also in D state in wait_barrier.

With the patch from Neil above, I can no longer trigger it anymore.

The discussion was in link below:
http://www.spinics.net/lists/raid/msg54680.html

Thanks,
Jack Wang


Re: [PATCH] blk: improve order of bio handling in generic_make_request()

2017-03-06 Thread Jack Wang


On 06.03.2017 05:40, NeilBrown wrote:
> On Fri, Mar 03 2017, Jack Wang wrote:
>>
>> Thanks Neil for pushing the fix.
>>
>> We can optimize generic_make_request a little bit:
>> - assign bio_list struct hold directly instead init and merge
>> - remove duplicate code
>>
>> I think better to squash into your fix.
> 
> Hi Jack,
>  I don't object to your changes, but I'd like to see a response from
>  Jens first.
>  My preference would be to get the original patch in, then other changes
>  that build on it, such as this one, can be added.  Until the core
>  changes lands, any other work is pointless.
> 
>  Of course if Jens wants a this merged before he'll apply it, I'll
>  happily do that.
> 
> Thanks,
> NeilBrown

Hi Neil,

Totally agree, let's wait for Jens's decision.

Hi Jens,

Please consider this fix also for stable 4.3+ 

Thanks,
Jack Wang

> 
> 
> 
>> ---
>>  block/blk-core.c | 9 ++---
>>  1 file changed, 2 insertions(+), 7 deletions(-)
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 3bc7202..b29b7e5 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -2147,8 +2147,7 @@ blk_qc_t generic_make_request(struct bio *bio)
>>  struct bio_list lower, same, hold;
>>  
>>  /* Create a fresh bio_list for all subordinate requests 
>> */
>> -bio_list_init();
>> -bio_list_merge(, _list_on_stack);
>> +hold = bio_list_on_stack;
>>  bio_list_init(_list_on_stack);
>>  
>>  ret = q->make_request_fn(q, bio);
>> @@ -2168,14 +2167,10 @@ blk_qc_t generic_make_request(struct bio *bio)
>>  bio_list_merge(_list_on_stack, );
>>  bio_list_merge(_list_on_stack, );
>>  bio_list_merge(_list_on_stack, );
>> -
>> -bio = bio_list_pop(current->bio_list);
>>  } else {
>> -struct bio *bio_next = bio_list_pop(current->bio_list);
>> -
>>  bio_io_error(bio);
>> -bio = bio_next;
>>  }
>> +bio = bio_list_pop(current->bio_list);
>>  } while (bio);
>>  current->bio_list = NULL; /* deactivate */
>>  
>> -- 
>>
>> Regards,
>> Jack 


Re: [PATCH] blk: improve order of bio handling in generic_make_request()

2017-03-06 Thread Jack Wang


On 06.03.2017 05:40, NeilBrown wrote:
> On Fri, Mar 03 2017, Jack Wang wrote:
>>
>> Thanks Neil for pushing the fix.
>>
>> We can optimize generic_make_request a little bit:
>> - assign bio_list struct hold directly instead init and merge
>> - remove duplicate code
>>
>> I think better to squash into your fix.
> 
> Hi Jack,
>  I don't object to your changes, but I'd like to see a response from
>  Jens first.
>  My preference would be to get the original patch in, then other changes
>  that build on it, such as this one, can be added.  Until the core
>  changes lands, any other work is pointless.
> 
>  Of course if Jens wants a this merged before he'll apply it, I'll
>  happily do that.
> 
> Thanks,
> NeilBrown

Hi Neil,

Totally agree, let's wait for Jens's decision.

Hi Jens,

Please consider this fix also for stable 4.3+ 

Thanks,
Jack Wang

> 
> 
> 
>> ---
>>  block/blk-core.c | 9 ++---
>>  1 file changed, 2 insertions(+), 7 deletions(-)
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 3bc7202..b29b7e5 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -2147,8 +2147,7 @@ blk_qc_t generic_make_request(struct bio *bio)
>>  struct bio_list lower, same, hold;
>>  
>>  /* Create a fresh bio_list for all subordinate requests 
>> */
>> -bio_list_init();
>> -bio_list_merge(, _list_on_stack);
>> +hold = bio_list_on_stack;
>>  bio_list_init(_list_on_stack);
>>  
>>  ret = q->make_request_fn(q, bio);
>> @@ -2168,14 +2167,10 @@ blk_qc_t generic_make_request(struct bio *bio)
>>  bio_list_merge(_list_on_stack, );
>>  bio_list_merge(_list_on_stack, );
>>  bio_list_merge(_list_on_stack, );
>> -
>> -bio = bio_list_pop(current->bio_list);
>>  } else {
>> -struct bio *bio_next = bio_list_pop(current->bio_list);
>> -
>>  bio_io_error(bio);
>> -bio = bio_next;
>>  }
>> +bio = bio_list_pop(current->bio_list);
>>  } while (bio);
>>  current->bio_list = NULL; /* deactivate */
>>  
>> -- 
>>
>> Regards,
>> Jack 


Re: [PATCH] blk: improve order of bio handling in generic_make_request()

2017-03-03 Thread Jack Wang


On 03.03.2017 06:14, NeilBrown wrote:
> 
> [ Hi Jens,
>   you might have seen assorted email threads recently about
>   deadlocks, particular in dm-snap or md/raid1/10.  Also about
>   the excess of rescuer threads.
>   I think a big part of the problem is my ancient improvement
>   to generic_make_request to queue bios and handle them in
>   a strict FIFO order.  As described below, that can cause
>   problems which individual block devices cannot fix themselves
>   without punting to various threads.
>   This patch does not fix everything, but provides a basis that
>   drives can build on to create dead-lock free solutions without
>   excess threads.
>   If you accept this, I will look into improving at least md
>   and bio_alloc_set() to be less dependant on rescuer threads.
>   Thanks,
>   NeilBrown
>  ]
> 
> 
> To avoid recursion on the kernel stack when stacked block devices
> are in use, generic_make_request() will, when called recursively,
> queue new requests for later handling.  They will be handled when the
> make_request_fn for the current bio completes.
> 
> If any bios are submitted by a make_request_fn, these will ultimately
> handled seqeuntially.  If the handling of one of those generates
> further requests, they will be added to the end of the queue.
> 
> This strict first-in-first-out behaviour can lead to deadlocks in
> various ways, normally because a request might need to wait for a
> previous request to the same device to complete.  This can happen when
> they share a mempool, and can happen due to interdependencies
> particular to the device.  Both md and dm have examples where this happens.
> 
> These deadlocks can be erradicated by more selective ordering of bios.
> Specifically by handling them in depth-first order.  That is: when the
> handling of one bio generates one or more further bios, they are
> handled immediately after the parent, before any siblings of the
> parent.  That way, when generic_make_request() calls make_request_fn
> for some particular device, it we can be certain that all previously
> submited request for that device have been completely handled and are
> not waiting for anything in the queue of requests maintained in
> generic_make_request().
> 
> An easy way to achieve this would be to use a last-in-first-out stack
> instead of a queue.  However this will change the order of consecutive
> bios submitted by a make_request_fn, which could have unexpected consequences.
> Instead we take a slightly more complex approach.
> A fresh queue is created for each call to a make_request_fn.  After it 
> completes,
> any bios for a different device are placed on the front of the main queue, 
> followed
> by any bios for the same device, followed by all bios that were already on
> the queue before the make_request_fn was called.
> This provides the depth-first approach without reordering bios on the same 
> level.
> 
> This, by itself, it not enough to remove the deadlocks.  It just makes
> it possible for drivers to take the extra step required themselves.
> 
> To avoid deadlocks, drivers must never risk waiting for a request
> after submitting one to generic_make_request.  This includes never
> allocing from a mempool twice in the one call to a make_request_fn.
> 
> A common pattern in drivers is to call bio_split() in a loop, handling
> the first part and then looping around to possibly split the next part.
> Instead, a driver that finds it needs to split a bio should queue
> (with generic_make_request) the second part, handle the first part,
> and then return.  The new code in generic_make_request will ensure the
> requests to underlying bios are processed first, then the second bio
> that was split off.  If it splits again, the same process happens.  In
> each case one bio will be completely handled before the next one is attempted.
> 
> With this is place, it should be possible to disable the
> punt_bios_to_recover() recovery thread for many block devices, and
> eventually it may be possible to remove it completely.
> 
> Tested-by: Jinpu Wang 
> Inspired-by: Lars Ellenberg 
> Signed-off-by: NeilBrown 
> ---
>  block/blk-core.c | 22 ++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index b9e857f4afe8..ef55f210dd7c 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -2018,10 +2018,32 @@ blk_qc_t generic_make_request(struct bio *bio)
>   struct request_queue *q = bdev_get_queue(bio->bi_bdev);
>  
>   if (likely(blk_queue_enter(q, false) == 0)) {
> + struct bio_list hold;
> + struct bio_list lower, same;
> +
> + /* Create a fresh bio_list for all subordinate requests 
> */
> + bio_list_init();
> + bio_list_merge(, _list_on_stack);
> + bio_list_init(_list_on_stack);
> 

Re: [PATCH] blk: improve order of bio handling in generic_make_request()

2017-03-03 Thread Jack Wang


On 03.03.2017 06:14, NeilBrown wrote:
> 
> [ Hi Jens,
>   you might have seen assorted email threads recently about
>   deadlocks, particular in dm-snap or md/raid1/10.  Also about
>   the excess of rescuer threads.
>   I think a big part of the problem is my ancient improvement
>   to generic_make_request to queue bios and handle them in
>   a strict FIFO order.  As described below, that can cause
>   problems which individual block devices cannot fix themselves
>   without punting to various threads.
>   This patch does not fix everything, but provides a basis that
>   drives can build on to create dead-lock free solutions without
>   excess threads.
>   If you accept this, I will look into improving at least md
>   and bio_alloc_set() to be less dependant on rescuer threads.
>   Thanks,
>   NeilBrown
>  ]
> 
> 
> To avoid recursion on the kernel stack when stacked block devices
> are in use, generic_make_request() will, when called recursively,
> queue new requests for later handling.  They will be handled when the
> make_request_fn for the current bio completes.
> 
> If any bios are submitted by a make_request_fn, these will ultimately
> handled seqeuntially.  If the handling of one of those generates
> further requests, they will be added to the end of the queue.
> 
> This strict first-in-first-out behaviour can lead to deadlocks in
> various ways, normally because a request might need to wait for a
> previous request to the same device to complete.  This can happen when
> they share a mempool, and can happen due to interdependencies
> particular to the device.  Both md and dm have examples where this happens.
> 
> These deadlocks can be erradicated by more selective ordering of bios.
> Specifically by handling them in depth-first order.  That is: when the
> handling of one bio generates one or more further bios, they are
> handled immediately after the parent, before any siblings of the
> parent.  That way, when generic_make_request() calls make_request_fn
> for some particular device, it we can be certain that all previously
> submited request for that device have been completely handled and are
> not waiting for anything in the queue of requests maintained in
> generic_make_request().
> 
> An easy way to achieve this would be to use a last-in-first-out stack
> instead of a queue.  However this will change the order of consecutive
> bios submitted by a make_request_fn, which could have unexpected consequences.
> Instead we take a slightly more complex approach.
> A fresh queue is created for each call to a make_request_fn.  After it 
> completes,
> any bios for a different device are placed on the front of the main queue, 
> followed
> by any bios for the same device, followed by all bios that were already on
> the queue before the make_request_fn was called.
> This provides the depth-first approach without reordering bios on the same 
> level.
> 
> This, by itself, it not enough to remove the deadlocks.  It just makes
> it possible for drivers to take the extra step required themselves.
> 
> To avoid deadlocks, drivers must never risk waiting for a request
> after submitting one to generic_make_request.  This includes never
> allocing from a mempool twice in the one call to a make_request_fn.
> 
> A common pattern in drivers is to call bio_split() in a loop, handling
> the first part and then looping around to possibly split the next part.
> Instead, a driver that finds it needs to split a bio should queue
> (with generic_make_request) the second part, handle the first part,
> and then return.  The new code in generic_make_request will ensure the
> requests to underlying bios are processed first, then the second bio
> that was split off.  If it splits again, the same process happens.  In
> each case one bio will be completely handled before the next one is attempted.
> 
> With this is place, it should be possible to disable the
> punt_bios_to_recover() recovery thread for many block devices, and
> eventually it may be possible to remove it completely.
> 
> Tested-by: Jinpu Wang 
> Inspired-by: Lars Ellenberg 
> Signed-off-by: NeilBrown 
> ---
>  block/blk-core.c | 22 ++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index b9e857f4afe8..ef55f210dd7c 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -2018,10 +2018,32 @@ blk_qc_t generic_make_request(struct bio *bio)
>   struct request_queue *q = bdev_get_queue(bio->bi_bdev);
>  
>   if (likely(blk_queue_enter(q, false) == 0)) {
> + struct bio_list hold;
> + struct bio_list lower, same;
> +
> + /* Create a fresh bio_list for all subordinate requests 
> */
> + bio_list_init();
> + bio_list_merge(, _list_on_stack);
> + bio_list_init(_list_on_stack);
>   ret = q->make_request_fn(q, bio);
>  
>   

Re: [dm-devel] [PATCH v2 1/1] block: fix blk_queue_split() resource exhaustion

2017-01-02 Thread Jack Wang
2016-12-23 12:45 GMT+01:00 Lars Ellenberg <lars.ellenb...@linbit.com>:
> On Fri, Dec 23, 2016 at 09:49:53AM +0100, Michael Wang wrote:
>> Dear Maintainers
>>
>> I'd like to ask for the status of this patch since we hit the
>> issue too during our testing on md raid1.
>>
>> Split remainder bio_A was queued ahead, following by bio_B for
>> lower device, at this moment raid start freezing, the loop take
>> out bio_A firstly and deliver it, which will hung since raid is
>> freezing, while the freezing never end since it waiting for
>> bio_B to finish, and bio_B is still on the queue, waiting for
>> bio_A to finish...
>>
>> We're looking for a good solution and we found this patch
>> already progressed a lot, but we can't find it on linux-next,
>> so we'd like to ask are we still planning to have this fix
>> in upstream?
>
> I don't see why not, I'd even like to have it in older kernels,
> but did not have the time and energy to push it.
>
> Thanks for the bump.
>
> Lars
>
Hi folks,

As Michael mentioned, we hit a bug this patch is trying to fix.
Neil suggested another way to fix it.  I attached below.
I personal prefer Neil's version as it's less code change, and straight forward.

Could you share your comments, we can get one fix into mainline.

Thanks,
Jinpu
From 69a4829a55503e496ce9c730d2c8e3dd8a08874a Mon Sep 17 00:00:00 2001
From: NeilBrown <ne...@suse.com>
Date: Wed, 14 Dec 2016 16:55:52 +0100
Subject: [PATCH] block: fix deadlock between freeze_array() and wait_barrier()

When we call wait_barrier, we might have some bios waiting
in current->bio_list, which prevents the array_freeze call to
complete. Those can only be internal READs, which have already
passed the wait_barrier call (thus incrementing nr_pending), but
still were not submitted to the lower level, due to generic_make_request
logic to avoid recursive calls. In such case, we have a deadlock:
- array_frozen is already set to 1, so wait_barrier unconditionally waits, so
- internal READ bios will not be submitted, thus freeze_array will
never completes.

To fix this, modify generic_make_request to always sort bio_list_on_stack
first with lowest level, then higher, until same level.

Sent to linux-raid mail list:
https://marc.info/?l=linux-raid=148232453107685=2

Suggested-by: NeilBrown <ne...@suse.com>
Signed-off-by: Jack Wang <jinpu.w...@profitbricks.com>
---
 block/blk-core.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 9e3ac56..47ef373 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2138,10 +2138,30 @@ blk_qc_t generic_make_request(struct bio *bio)
 		struct request_queue *q = bdev_get_queue(bio->bi_bdev);
 
 		if (likely(blk_queue_enter(q, __GFP_DIRECT_RECLAIM) == 0)) {
+			struct bio_list lower, same, hold;
+
+			/* Create a fresh bio_list for all subordinate requests */
+			bio_list_init();
+			bio_list_merge(, _list_on_stack);
+			bio_list_init(_list_on_stack);
 
 			ret = q->make_request_fn(q, bio);
 
 			blk_queue_exit(q);
+			/* sort new bios into those for a lower level
+			 * and those for the same level
+			 */
+			bio_list_init();
+			bio_list_init();
+			while ((bio = bio_list_pop(_list_on_stack)) != NULL)
+if (q == bdev_get_queue(bio->bi_bdev))
+	bio_list_add(, bio);
+else
+	bio_list_add(, bio);
+			/* now assemble so we handle the lowest level first */
+			bio_list_merge(_list_on_stack, );
+			bio_list_merge(_list_on_stack, );
+			bio_list_merge(_list_on_stack, );
 
 			bio = bio_list_pop(current->bio_list);
 		} else {
-- 
2.7.4



Re: [dm-devel] [PATCH v2 1/1] block: fix blk_queue_split() resource exhaustion

2017-01-02 Thread Jack Wang
2016-12-23 12:45 GMT+01:00 Lars Ellenberg :
> On Fri, Dec 23, 2016 at 09:49:53AM +0100, Michael Wang wrote:
>> Dear Maintainers
>>
>> I'd like to ask for the status of this patch since we hit the
>> issue too during our testing on md raid1.
>>
>> Split remainder bio_A was queued ahead, following by bio_B for
>> lower device, at this moment raid start freezing, the loop take
>> out bio_A firstly and deliver it, which will hung since raid is
>> freezing, while the freezing never end since it waiting for
>> bio_B to finish, and bio_B is still on the queue, waiting for
>> bio_A to finish...
>>
>> We're looking for a good solution and we found this patch
>> already progressed a lot, but we can't find it on linux-next,
>> so we'd like to ask are we still planning to have this fix
>> in upstream?
>
> I don't see why not, I'd even like to have it in older kernels,
> but did not have the time and energy to push it.
>
> Thanks for the bump.
>
> Lars
>
Hi folks,

As Michael mentioned, we hit a bug this patch is trying to fix.
Neil suggested another way to fix it.  I attached below.
I personal prefer Neil's version as it's less code change, and straight forward.

Could you share your comments, we can get one fix into mainline.

Thanks,
Jinpu
From 69a4829a55503e496ce9c730d2c8e3dd8a08874a Mon Sep 17 00:00:00 2001
From: NeilBrown 
Date: Wed, 14 Dec 2016 16:55:52 +0100
Subject: [PATCH] block: fix deadlock between freeze_array() and wait_barrier()

When we call wait_barrier, we might have some bios waiting
in current->bio_list, which prevents the array_freeze call to
complete. Those can only be internal READs, which have already
passed the wait_barrier call (thus incrementing nr_pending), but
still were not submitted to the lower level, due to generic_make_request
logic to avoid recursive calls. In such case, we have a deadlock:
- array_frozen is already set to 1, so wait_barrier unconditionally waits, so
- internal READ bios will not be submitted, thus freeze_array will
never completes.

To fix this, modify generic_make_request to always sort bio_list_on_stack
first with lowest level, then higher, until same level.

Sent to linux-raid mail list:
https://marc.info/?l=linux-raid=148232453107685=2

Suggested-by: NeilBrown 
Signed-off-by: Jack Wang 
---
 block/blk-core.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 9e3ac56..47ef373 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2138,10 +2138,30 @@ blk_qc_t generic_make_request(struct bio *bio)
 		struct request_queue *q = bdev_get_queue(bio->bi_bdev);
 
 		if (likely(blk_queue_enter(q, __GFP_DIRECT_RECLAIM) == 0)) {
+			struct bio_list lower, same, hold;
+
+			/* Create a fresh bio_list for all subordinate requests */
+			bio_list_init();
+			bio_list_merge(, _list_on_stack);
+			bio_list_init(_list_on_stack);
 
 			ret = q->make_request_fn(q, bio);
 
 			blk_queue_exit(q);
+			/* sort new bios into those for a lower level
+			 * and those for the same level
+			 */
+			bio_list_init();
+			bio_list_init();
+			while ((bio = bio_list_pop(_list_on_stack)) != NULL)
+if (q == bdev_get_queue(bio->bi_bdev))
+	bio_list_add(, bio);
+else
+	bio_list_add(, bio);
+			/* now assemble so we handle the lowest level first */
+			bio_list_merge(_list_on_stack, );
+			bio_list_merge(_list_on_stack, );
+			bio_list_merge(_list_on_stack, );
 
 			bio = bio_list_pop(current->bio_list);
 		} else {
-- 
2.7.4



Re: [PATCH/RFC] add "failfast" support for raid1/raid10.

2016-11-24 Thread Jack Wang
Hi Neil,

2016-11-24 5:47 GMT+01:00 NeilBrown <ne...@suse.com>:
> On Sat, Nov 19 2016, Jack Wang wrote:
>
>> 2016-11-18 6:16 GMT+01:00 NeilBrown <ne...@suse.com>:
>>> Hi,
>>>
>>>  I've been sitting on these patches for a while because although they
>>>  solve a real problem, it is a fairly limited use-case, and I don't
>>>  really like some of the details.
>>>
>>>  So I'm posting them as RFC in the hope that a different perspective
>>>  might help me like them better, or find a better approach.
>>>
>>>  The core idea is that when you have multiple copies of data
>>>  (i.e. mirrored drives) it doesn't make sense to wait for a read from
>>>  a drive that seems to be having problems.  It will probably be faster
>>>  to just cancel that read, and read from the other device.
>>>  Similarly, in some circumstances, it might be better to fail a drive
>>>  that is being slow to respond to writes, rather than cause all writes
>>>  to be very slow.
>>>
>>>  The particular context where this comes up is when mirroring across
>>>  storage arrays, where the storage arrays can temporarily take an
>>>  unusually long time to respond to requests (firmware updates have
>>>  been mentioned).  As the array will have redundancy internally, there
>>>  is little risk to the data.  The mirrored pair is really only for
>>>  disaster recovery, and it is deemed better to lose the last few
>>>  minutes of updates in the case of a serious disaster, rather than
>>>  occasionally having latency issues because one array needs to do some
>>>  maintenance for a few minutes.  The particular storage arrays in
>>>  question are DASD devices which are part of the s390 ecosystem.
>>
>> Hi Neil,
>>
>> Thanks for pushing this feature also to mainline.
>> We at Profitbricks use raid1 across IB network, one pserver with
>> raid1, both legs on 2 remote storages.
>> We've noticed if one remote storage crash , and raid1 still keep
>> sending IO to the faulty leg, even after 5 minutes,
>> md still redirect I/Os, and md refuse to remove active disks, eg:
>
> That make sense.  It cannot remove the active disk until all pending IO
> completes, either with an error or with success.
>
> If the target has a long timeout, that can delay progress a lot.
>
>>
>> I tried to port you patch from SLES[1], with the patchset, it reduce
>> the time to ~30 seconds.
>>
>> I'm happy to see this feature upstream :)
>> I will test again this new patchset.
>
> Thanks for your confirmation that this is more generally useful than I
> thought, and I'm always happy to hear for more testing :-)
>
> Thanks,
> NeilBrown

Just want to update test result, so far it's working fine, no regression :)
Will report if anything breaks.

Thanks
Jack


Re: [PATCH/RFC] add "failfast" support for raid1/raid10.

2016-11-24 Thread Jack Wang
Hi Neil,

2016-11-24 5:47 GMT+01:00 NeilBrown :
> On Sat, Nov 19 2016, Jack Wang wrote:
>
>> 2016-11-18 6:16 GMT+01:00 NeilBrown :
>>> Hi,
>>>
>>>  I've been sitting on these patches for a while because although they
>>>  solve a real problem, it is a fairly limited use-case, and I don't
>>>  really like some of the details.
>>>
>>>  So I'm posting them as RFC in the hope that a different perspective
>>>  might help me like them better, or find a better approach.
>>>
>>>  The core idea is that when you have multiple copies of data
>>>  (i.e. mirrored drives) it doesn't make sense to wait for a read from
>>>  a drive that seems to be having problems.  It will probably be faster
>>>  to just cancel that read, and read from the other device.
>>>  Similarly, in some circumstances, it might be better to fail a drive
>>>  that is being slow to respond to writes, rather than cause all writes
>>>  to be very slow.
>>>
>>>  The particular context where this comes up is when mirroring across
>>>  storage arrays, where the storage arrays can temporarily take an
>>>  unusually long time to respond to requests (firmware updates have
>>>  been mentioned).  As the array will have redundancy internally, there
>>>  is little risk to the data.  The mirrored pair is really only for
>>>  disaster recovery, and it is deemed better to lose the last few
>>>  minutes of updates in the case of a serious disaster, rather than
>>>  occasionally having latency issues because one array needs to do some
>>>  maintenance for a few minutes.  The particular storage arrays in
>>>  question are DASD devices which are part of the s390 ecosystem.
>>
>> Hi Neil,
>>
>> Thanks for pushing this feature also to mainline.
>> We at Profitbricks use raid1 across IB network, one pserver with
>> raid1, both legs on 2 remote storages.
>> We've noticed if one remote storage crash , and raid1 still keep
>> sending IO to the faulty leg, even after 5 minutes,
>> md still redirect I/Os, and md refuse to remove active disks, eg:
>
> That make sense.  It cannot remove the active disk until all pending IO
> completes, either with an error or with success.
>
> If the target has a long timeout, that can delay progress a lot.
>
>>
>> I tried to port you patch from SLES[1], with the patchset, it reduce
>> the time to ~30 seconds.
>>
>> I'm happy to see this feature upstream :)
>> I will test again this new patchset.
>
> Thanks for your confirmation that this is more generally useful than I
> thought, and I'm always happy to hear for more testing :-)
>
> Thanks,
> NeilBrown

Just want to update test result, so far it's working fine, no regression :)
Will report if anything breaks.

Thanks
Jack


Re: [PATCH/RFC] add "failfast" support for raid1/raid10.

2016-11-18 Thread Jack Wang
2016-11-18 6:16 GMT+01:00 NeilBrown <ne...@suse.com>:
> Hi,
>
>  I've been sitting on these patches for a while because although they
>  solve a real problem, it is a fairly limited use-case, and I don't
>  really like some of the details.
>
>  So I'm posting them as RFC in the hope that a different perspective
>  might help me like them better, or find a better approach.
>
>  The core idea is that when you have multiple copies of data
>  (i.e. mirrored drives) it doesn't make sense to wait for a read from
>  a drive that seems to be having problems.  It will probably be faster
>  to just cancel that read, and read from the other device.
>  Similarly, in some circumstances, it might be better to fail a drive
>  that is being slow to respond to writes, rather than cause all writes
>  to be very slow.
>
>  The particular context where this comes up is when mirroring across
>  storage arrays, where the storage arrays can temporarily take an
>  unusually long time to respond to requests (firmware updates have
>  been mentioned).  As the array will have redundancy internally, there
>  is little risk to the data.  The mirrored pair is really only for
>  disaster recovery, and it is deemed better to lose the last few
>  minutes of updates in the case of a serious disaster, rather than
>  occasionally having latency issues because one array needs to do some
>  maintenance for a few minutes.  The particular storage arrays in
>  question are DASD devices which are part of the s390 ecosystem.

Hi Neil,

Thanks for pushing this feature also to mainline.
We at Profitbricks use raid1 across IB network, one pserver with
raid1, both legs on 2 remote storages.
We've noticed if one remote storage crash , and raid1 still keep
sending IO to the faulty leg, even after 5 minutes,
md still redirect I/Os, and md refuse to remove active disks, eg:

2016-10-27T19:47:07.776233+02:00 pserver25 kernel: [184749.101984]
md/raid1:md23: Disk failure on ibnbd47, disabling device.

2016-10-27T19:47:07.776243+02:00 pserver25 kernel: [184749.101984]
md/raid1:md23: Operation continuing on 1 devices.

[...]

2016-10-27T19:47:16.171694+02:00 pserver25 kernel: [184757.498693]
md/raid1:md23: redirecting sector 79104 to other mirror: ibnbd46

[...]

2016-10-27T19:47:21.301732+02:00 pserver25 kernel: [184762.627288]
md/raid1:md23: redirecting sector 79232 to other mirror: ibnbd46

[...]

2016-10-27T19:47:35.501725+02:00 pserver25 kernel: [184776.829069] md:
cannot remove active disk ibnbd47 from md23 ...

2016-10-27T19:47:36.801769+02:00 pserver25 kernel: [184778.128856] md:
cannot remove active disk ibnbd47 from md23 ...

[...]

2016-10-27T19:52:33.401816+02:00 pserver25 kernel: [185074.727859]
md/raid1:md23: redirecting sector 72832 to other mirror: ibnbd46

2016-10-27T19:52:36.601693+02:00 pserver25 kernel: [185077.924835]
md/raid1:md23: redirecting sector 78336 to other mirror: ibnbd46

2016-10-27T19:52:36.601728+02:00 pserver25 kernel: [185077.925083]
RAID1 conf printout:

2016-10-27T19:52:36.601731+02:00 pserver25 kernel: [185077.925087]
--- wd:1 rd:2

2016-10-27T19:52:36.601733+02:00 pserver25 kernel: [185077.925091]
disk 0, wo:0, o:1, dev:ibnbd46

2016-10-27T19:52:36.601735+02:00 pserver25 kernel: [185077.925093]
disk 1, wo:1, o:0, dev:ibnbd47

2016-10-27T19:52:36.681691+02:00 pserver25 kernel: [185078.003392]
RAID1 conf printout:

2016-10-27T19:52:36.681706+02:00 pserver25 kernel: [185078.003404]
--- wd:1 rd:2

2016-10-27T19:52:36.681709+02:00 pserver25 kernel: [185078.003409]
disk 0, wo:0, o:1, dev:ibnbd46

I tried to port you patch from SLES[1], with the patchset, it reduce
the time to ~30 seconds.

I'm happy to see this feature upstream :)
I will test again this new patchset.

Cheers,
Jack Wang


Re: [PATCH/RFC] add "failfast" support for raid1/raid10.

2016-11-18 Thread Jack Wang
2016-11-18 6:16 GMT+01:00 NeilBrown :
> Hi,
>
>  I've been sitting on these patches for a while because although they
>  solve a real problem, it is a fairly limited use-case, and I don't
>  really like some of the details.
>
>  So I'm posting them as RFC in the hope that a different perspective
>  might help me like them better, or find a better approach.
>
>  The core idea is that when you have multiple copies of data
>  (i.e. mirrored drives) it doesn't make sense to wait for a read from
>  a drive that seems to be having problems.  It will probably be faster
>  to just cancel that read, and read from the other device.
>  Similarly, in some circumstances, it might be better to fail a drive
>  that is being slow to respond to writes, rather than cause all writes
>  to be very slow.
>
>  The particular context where this comes up is when mirroring across
>  storage arrays, where the storage arrays can temporarily take an
>  unusually long time to respond to requests (firmware updates have
>  been mentioned).  As the array will have redundancy internally, there
>  is little risk to the data.  The mirrored pair is really only for
>  disaster recovery, and it is deemed better to lose the last few
>  minutes of updates in the case of a serious disaster, rather than
>  occasionally having latency issues because one array needs to do some
>  maintenance for a few minutes.  The particular storage arrays in
>  question are DASD devices which are part of the s390 ecosystem.

Hi Neil,

Thanks for pushing this feature also to mainline.
We at Profitbricks use raid1 across IB network, one pserver with
raid1, both legs on 2 remote storages.
We've noticed if one remote storage crash , and raid1 still keep
sending IO to the faulty leg, even after 5 minutes,
md still redirect I/Os, and md refuse to remove active disks, eg:

2016-10-27T19:47:07.776233+02:00 pserver25 kernel: [184749.101984]
md/raid1:md23: Disk failure on ibnbd47, disabling device.

2016-10-27T19:47:07.776243+02:00 pserver25 kernel: [184749.101984]
md/raid1:md23: Operation continuing on 1 devices.

[...]

2016-10-27T19:47:16.171694+02:00 pserver25 kernel: [184757.498693]
md/raid1:md23: redirecting sector 79104 to other mirror: ibnbd46

[...]

2016-10-27T19:47:21.301732+02:00 pserver25 kernel: [184762.627288]
md/raid1:md23: redirecting sector 79232 to other mirror: ibnbd46

[...]

2016-10-27T19:47:35.501725+02:00 pserver25 kernel: [184776.829069] md:
cannot remove active disk ibnbd47 from md23 ...

2016-10-27T19:47:36.801769+02:00 pserver25 kernel: [184778.128856] md:
cannot remove active disk ibnbd47 from md23 ...

[...]

2016-10-27T19:52:33.401816+02:00 pserver25 kernel: [185074.727859]
md/raid1:md23: redirecting sector 72832 to other mirror: ibnbd46

2016-10-27T19:52:36.601693+02:00 pserver25 kernel: [185077.924835]
md/raid1:md23: redirecting sector 78336 to other mirror: ibnbd46

2016-10-27T19:52:36.601728+02:00 pserver25 kernel: [185077.925083]
RAID1 conf printout:

2016-10-27T19:52:36.601731+02:00 pserver25 kernel: [185077.925087]
--- wd:1 rd:2

2016-10-27T19:52:36.601733+02:00 pserver25 kernel: [185077.925091]
disk 0, wo:0, o:1, dev:ibnbd46

2016-10-27T19:52:36.601735+02:00 pserver25 kernel: [185077.925093]
disk 1, wo:1, o:0, dev:ibnbd47

2016-10-27T19:52:36.681691+02:00 pserver25 kernel: [185078.003392]
RAID1 conf printout:

2016-10-27T19:52:36.681706+02:00 pserver25 kernel: [185078.003404]
--- wd:1 rd:2

2016-10-27T19:52:36.681709+02:00 pserver25 kernel: [185078.003409]
disk 0, wo:0, o:1, dev:ibnbd46

I tried to port you patch from SLES[1], with the patchset, it reduce
the time to ~30 seconds.

I'm happy to see this feature upstream :)
I will test again this new patchset.

Cheers,
Jack Wang


Re: [PATCH v2 2/8] IB/core: Replace semaphore sm_sem with completion

2016-10-25 Thread Jack Wang
Hi Binoy,
2016-10-25 17:08 GMT+02:00 Binoy Jayan <binoy.ja...@linaro.org>:
> On 25 October 2016 at 18:13, Jack Wang <xjtu...@gmail.com> wrote:
>> Hi Binoy,
>>
>> snip
>>>
>>> port->ib_dev   = device;
>>> port->port_num = port_num;
>>> -   sema_init(>sm_sem, 1);
>>> +   init_completion(>sm_comp);
>>> +   complete(>sm_comp);
>>
>> Why complete here?
>>
>>> mutex_init(>file_mutex);
>>> INIT_LIST_HEAD(>file_list);
>>>
>>> --
>> KR,
>> Jinpu
>
>
> Hi Jack,
>
> ib_umad_sm_open() calls wait_for_completion_interruptible() which comes before
> ib_umad_sm_close() that calls complete(). In the initial open() there
> will not be
> anybody to signal the completion, so the complete is called to mark
> the initial state.
> I am not sure if this is the right way to do it, though.
>
> -Binoy

>From Documentation/scheduler/completion.txt ,
"
117 This is not implying any temporal order on wait_for_completion() and the
118 call to complete() - if the call to complete() happened before the call
119 to wait_for_completion() then the waiting side simply will continue
120 immediately as all dependencies are satisfied if not it will block until
121 completion is signaled by complete().
"
In this case here, if sm_open/sm_close are paired, it should work.

KR
Jack


Re: [PATCH v2 2/8] IB/core: Replace semaphore sm_sem with completion

2016-10-25 Thread Jack Wang
Hi Binoy,
2016-10-25 17:08 GMT+02:00 Binoy Jayan :
> On 25 October 2016 at 18:13, Jack Wang  wrote:
>> Hi Binoy,
>>
>> snip
>>>
>>> port->ib_dev   = device;
>>> port->port_num = port_num;
>>> -   sema_init(>sm_sem, 1);
>>> +   init_completion(>sm_comp);
>>> +   complete(>sm_comp);
>>
>> Why complete here?
>>
>>> mutex_init(>file_mutex);
>>> INIT_LIST_HEAD(>file_list);
>>>
>>> --
>> KR,
>> Jinpu
>
>
> Hi Jack,
>
> ib_umad_sm_open() calls wait_for_completion_interruptible() which comes before
> ib_umad_sm_close() that calls complete(). In the initial open() there
> will not be
> anybody to signal the completion, so the complete is called to mark
> the initial state.
> I am not sure if this is the right way to do it, though.
>
> -Binoy

>From Documentation/scheduler/completion.txt ,
"
117 This is not implying any temporal order on wait_for_completion() and the
118 call to complete() - if the call to complete() happened before the call
119 to wait_for_completion() then the waiting side simply will continue
120 immediately as all dependencies are satisfied if not it will block until
121 completion is signaled by complete().
"
In this case here, if sm_open/sm_close are paired, it should work.

KR
Jack


Re: [PATCH v2 2/8] IB/core: Replace semaphore sm_sem with completion

2016-10-25 Thread Jack Wang
Hi Binoy,

snip
>
> port->ib_dev   = device;
> port->port_num = port_num;
> -   sema_init(>sm_sem, 1);
> +   init_completion(>sm_comp);
> +   complete(>sm_comp);

Why complete here?

> mutex_init(>file_mutex);
> INIT_LIST_HEAD(>file_list);
>
> --
KR,
Jinpu


Re: [PATCH v2 2/8] IB/core: Replace semaphore sm_sem with completion

2016-10-25 Thread Jack Wang
Hi Binoy,

snip
>
> port->ib_dev   = device;
> port->port_num = port_num;
> -   sema_init(>sm_sem, 1);
> +   init_completion(>sm_comp);
> +   complete(>sm_comp);

Why complete here?

> mutex_init(>file_mutex);
> INIT_LIST_HEAD(>file_list);
>
> --
KR,
Jinpu


Re: [PATCH] pm80xx: Don't override ts->stat on IO_OPEN_CNX_ERROR_HW_RESOURCE_BUSY

2015-08-17 Thread Jack Wang
2015-08-17 15:04 GMT+02:00 Johannes Thumshirn :
> In case XXX returns with a status of IO_OPEN_CNX_ERROR_HW_RESOURCE_BUSY
> ts->stat gets set to SAS_OPEN_REJECT but a missing 'break' statement causes a
> fallthrough to the default handler of the switch statement overriding ts->stat
> to SAS_DEV_NO_RESPONSE.
>
> Signed-off-by: Johannes Thumshirn 

Thanks, please feel free to add:
Acked-by: Jack Wang 

> ---
>  drivers/scsi/pm8001/pm8001_hwi.c | 1 +
>  drivers/scsi/pm8001/pm80xx_hwi.c | 1 +
>  2 files changed, 2 insertions(+)
>
> diff --git a/drivers/scsi/pm8001/pm8001_hwi.c 
> b/drivers/scsi/pm8001/pm8001_hwi.c
> index 96dcc09..d0feec5 100644
> --- a/drivers/scsi/pm8001/pm8001_hwi.c
> +++ b/drivers/scsi/pm8001/pm8001_hwi.c
> @@ -2642,6 +2642,7 @@ mpi_sata_completion(struct pm8001_hba_info *pm8001_ha, 
> void *piomb)
> ts->resp = SAS_TASK_COMPLETE;
> ts->stat = SAS_OPEN_REJECT;
> ts->open_rej_reason = SAS_OREJ_RSVD_RETRY;
> +   break;
> default:
> PM8001_IO_DBG(pm8001_ha,
> pm8001_printk("Unknown status 0x%x\n", status));
> diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c 
> b/drivers/scsi/pm8001/pm80xx_hwi.c
> index 05cce46..18d4ac4 100644
> --- a/drivers/scsi/pm8001/pm80xx_hwi.c
> +++ b/drivers/scsi/pm8001/pm80xx_hwi.c
> @@ -2314,6 +2314,7 @@ mpi_sata_completion(struct pm8001_hba_info *pm8001_ha, 
> void *piomb)
> ts->resp = SAS_TASK_COMPLETE;
> ts->stat = SAS_OPEN_REJECT;
> ts->open_rej_reason = SAS_OREJ_RSVD_RETRY;
> +   break;
> default:
> PM8001_IO_DBG(pm8001_ha,
> pm8001_printk("Unknown status 0x%x\n", status));
> --
> 2.5.0
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pm80xx: Don't override ts-stat on IO_OPEN_CNX_ERROR_HW_RESOURCE_BUSY

2015-08-17 Thread Jack Wang
2015-08-17 15:04 GMT+02:00 Johannes Thumshirn jthumsh...@suse.de:
 In case XXX returns with a status of IO_OPEN_CNX_ERROR_HW_RESOURCE_BUSY
 ts-stat gets set to SAS_OPEN_REJECT but a missing 'break' statement causes a
 fallthrough to the default handler of the switch statement overriding ts-stat
 to SAS_DEV_NO_RESPONSE.

 Signed-off-by: Johannes Thumshirn jthumsh...@suse.de

Thanks, please feel free to add:
Acked-by: Jack Wang jinpu.w...@profitbricks.com

 ---
  drivers/scsi/pm8001/pm8001_hwi.c | 1 +
  drivers/scsi/pm8001/pm80xx_hwi.c | 1 +
  2 files changed, 2 insertions(+)

 diff --git a/drivers/scsi/pm8001/pm8001_hwi.c 
 b/drivers/scsi/pm8001/pm8001_hwi.c
 index 96dcc09..d0feec5 100644
 --- a/drivers/scsi/pm8001/pm8001_hwi.c
 +++ b/drivers/scsi/pm8001/pm8001_hwi.c
 @@ -2642,6 +2642,7 @@ mpi_sata_completion(struct pm8001_hba_info *pm8001_ha, 
 void *piomb)
 ts-resp = SAS_TASK_COMPLETE;
 ts-stat = SAS_OPEN_REJECT;
 ts-open_rej_reason = SAS_OREJ_RSVD_RETRY;
 +   break;
 default:
 PM8001_IO_DBG(pm8001_ha,
 pm8001_printk(Unknown status 0x%x\n, status));
 diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c 
 b/drivers/scsi/pm8001/pm80xx_hwi.c
 index 05cce46..18d4ac4 100644
 --- a/drivers/scsi/pm8001/pm80xx_hwi.c
 +++ b/drivers/scsi/pm8001/pm80xx_hwi.c
 @@ -2314,6 +2314,7 @@ mpi_sata_completion(struct pm8001_hba_info *pm8001_ha, 
 void *piomb)
 ts-resp = SAS_TASK_COMPLETE;
 ts-stat = SAS_OPEN_REJECT;
 ts-open_rej_reason = SAS_OREJ_RSVD_RETRY;
 +   break;
 default:
 PM8001_IO_DBG(pm8001_ha,
 pm8001_printk(Unknown status 0x%x\n, status));
 --
 2.5.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 RESEND 18/23] pm8001: Use pci_enable_msix_exact() instead of pci_enable_msix()

2014-07-27 Thread Jack Wang
Hi Alex,

Looks Ok for me.
Please feel free to add my:
Reviewed-by: Jack Wang 

Thanks,
Jack

2014-07-26 10:33 GMT+02:00 Alexander Gordeev :
> On Wed, Jul 16, 2014 at 08:05:22PM +0200, Alexander Gordeev wrote:
>> As result of deprecation of MSI-X/MSI enablement functions
>> pci_enable_msix() and pci_enable_msi_block() all drivers
>> using these two interfaces need to be updated to use the
>> new pci_enable_msi_range()  or pci_enable_msi_exact()
>> and pci_enable_msix_range() or pci_enable_msix_exact()
>> interfaces.
>
> Hi Jack, Lindar,
>
> Could you please review this patch?
>
> Thanks!
>
>> Signed-off-by: Alexander Gordeev 
>> Cc: xjtu...@gmail.com
>> Cc: lindar_...@usish.com
>> Cc: linux-s...@vger.kernel.org
>> Cc: linux-...@vger.kernel.org
>> ---
>>  drivers/scsi/pm8001/pm8001_init.c |   39 
>> +++--
>>  1 files changed, 20 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/scsi/pm8001/pm8001_init.c 
>> b/drivers/scsi/pm8001/pm8001_init.c
>> index e837ece..4057c24 100644
>> --- a/drivers/scsi/pm8001/pm8001_init.c
>> +++ b/drivers/scsi/pm8001/pm8001_init.c
>> @@ -729,34 +729,35 @@ static u32 pm8001_setup_msix(struct pm8001_hba_info 
>> *pm8001_ha)
>>   sizeof(pm8001_ha->msix_entries[0]);
>>   for (i = 0; i < max_entry ; i++)
>>   pm8001_ha->msix_entries[i].entry = i;
>> - rc = pci_enable_msix(pm8001_ha->pdev, pm8001_ha->msix_entries,
>> + rc = pci_enable_msix_exact(pm8001_ha->pdev, pm8001_ha->msix_entries,
>>   number_of_intr);
>>   pm8001_ha->number_of_intr = number_of_intr;
>> - if (!rc) {
>> - PM8001_INIT_DBG(pm8001_ha, pm8001_printk(
>> - "pci_enable_msix request ret:%d no of intr %d\n",
>> - rc, pm8001_ha->number_of_intr));
>> + if (rc)
>> + return rc;
>>
>> + PM8001_INIT_DBG(pm8001_ha, pm8001_printk(
>> + "pci_enable_msix_exact request ret:%d no of intr %d\n",
>> + rc, pm8001_ha->number_of_intr));
>>
>> - for (i = 0; i < number_of_intr; i++) {
>> - snprintf(intr_drvname[i], sizeof(intr_drvname[0]),
>> - DRV_NAME"%d", i);
>> - pm8001_ha->irq_vector[i].irq_id = i;
>> - pm8001_ha->irq_vector[i].drv_inst = pm8001_ha;
>> + for (i = 0; i < number_of_intr; i++) {
>> + snprintf(intr_drvname[i], sizeof(intr_drvname[0]),
>> + DRV_NAME"%d", i);
>> + pm8001_ha->irq_vector[i].irq_id = i;
>> + pm8001_ha->irq_vector[i].drv_inst = pm8001_ha;
>>
>> - rc = request_irq(pm8001_ha->msix_entries[i].vector,
>> - pm8001_interrupt_handler_msix, flag,
>> - intr_drvname[i], &(pm8001_ha->irq_vector[i]));
>> - if (rc) {
>> - for (j = 0; j < i; j++)
>> - free_irq(
>> - pm8001_ha->msix_entries[j].vector,
>> + rc = request_irq(pm8001_ha->msix_entries[i].vector,
>> + pm8001_interrupt_handler_msix, flag,
>> + intr_drvname[i], &(pm8001_ha->irq_vector[i]));
>> + if (rc) {
>> + for (j = 0; j < i; j++) {
>> + free_irq(pm8001_ha->msix_entries[j].vector,
>>   &(pm8001_ha->irq_vector[i]));
>> - pci_disable_msix(pm8001_ha->pdev);
>> - break;
>>   }
>> + pci_disable_msix(pm8001_ha->pdev);
>> + break;
>>   }
>>   }
>> +
>>   return rc;
>>  }
>>  #endif
>> --
>> 1.7.7.6
>>
>
> --
> Regards,
> Alexander Gordeev
> agord...@redhat.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 RESEND 18/23] pm8001: Use pci_enable_msix_exact() instead of pci_enable_msix()

2014-07-27 Thread Jack Wang
Hi Alex,

Looks Ok for me.
Please feel free to add my:
Reviewed-by: Jack Wang xjtu...@gmail.com

Thanks,
Jack

2014-07-26 10:33 GMT+02:00 Alexander Gordeev agord...@redhat.com:
 On Wed, Jul 16, 2014 at 08:05:22PM +0200, Alexander Gordeev wrote:
 As result of deprecation of MSI-X/MSI enablement functions
 pci_enable_msix() and pci_enable_msi_block() all drivers
 using these two interfaces need to be updated to use the
 new pci_enable_msi_range()  or pci_enable_msi_exact()
 and pci_enable_msix_range() or pci_enable_msix_exact()
 interfaces.

 Hi Jack, Lindar,

 Could you please review this patch?

 Thanks!

 Signed-off-by: Alexander Gordeev agord...@redhat.com
 Cc: xjtu...@gmail.com
 Cc: lindar_...@usish.com
 Cc: linux-s...@vger.kernel.org
 Cc: linux-...@vger.kernel.org
 ---
  drivers/scsi/pm8001/pm8001_init.c |   39 
 +++--
  1 files changed, 20 insertions(+), 19 deletions(-)

 diff --git a/drivers/scsi/pm8001/pm8001_init.c 
 b/drivers/scsi/pm8001/pm8001_init.c
 index e837ece..4057c24 100644
 --- a/drivers/scsi/pm8001/pm8001_init.c
 +++ b/drivers/scsi/pm8001/pm8001_init.c
 @@ -729,34 +729,35 @@ static u32 pm8001_setup_msix(struct pm8001_hba_info 
 *pm8001_ha)
   sizeof(pm8001_ha-msix_entries[0]);
   for (i = 0; i  max_entry ; i++)
   pm8001_ha-msix_entries[i].entry = i;
 - rc = pci_enable_msix(pm8001_ha-pdev, pm8001_ha-msix_entries,
 + rc = pci_enable_msix_exact(pm8001_ha-pdev, pm8001_ha-msix_entries,
   number_of_intr);
   pm8001_ha-number_of_intr = number_of_intr;
 - if (!rc) {
 - PM8001_INIT_DBG(pm8001_ha, pm8001_printk(
 - pci_enable_msix request ret:%d no of intr %d\n,
 - rc, pm8001_ha-number_of_intr));
 + if (rc)
 + return rc;

 + PM8001_INIT_DBG(pm8001_ha, pm8001_printk(
 + pci_enable_msix_exact request ret:%d no of intr %d\n,
 + rc, pm8001_ha-number_of_intr));

 - for (i = 0; i  number_of_intr; i++) {
 - snprintf(intr_drvname[i], sizeof(intr_drvname[0]),
 - DRV_NAME%d, i);
 - pm8001_ha-irq_vector[i].irq_id = i;
 - pm8001_ha-irq_vector[i].drv_inst = pm8001_ha;
 + for (i = 0; i  number_of_intr; i++) {
 + snprintf(intr_drvname[i], sizeof(intr_drvname[0]),
 + DRV_NAME%d, i);
 + pm8001_ha-irq_vector[i].irq_id = i;
 + pm8001_ha-irq_vector[i].drv_inst = pm8001_ha;

 - rc = request_irq(pm8001_ha-msix_entries[i].vector,
 - pm8001_interrupt_handler_msix, flag,
 - intr_drvname[i], (pm8001_ha-irq_vector[i]));
 - if (rc) {
 - for (j = 0; j  i; j++)
 - free_irq(
 - pm8001_ha-msix_entries[j].vector,
 + rc = request_irq(pm8001_ha-msix_entries[i].vector,
 + pm8001_interrupt_handler_msix, flag,
 + intr_drvname[i], (pm8001_ha-irq_vector[i]));
 + if (rc) {
 + for (j = 0; j  i; j++) {
 + free_irq(pm8001_ha-msix_entries[j].vector,
   (pm8001_ha-irq_vector[i]));
 - pci_disable_msix(pm8001_ha-pdev);
 - break;
   }
 + pci_disable_msix(pm8001_ha-pdev);
 + break;
   }
   }
 +
   return rc;
  }
  #endif
 --
 1.7.7.6


 --
 Regards,
 Alexander Gordeev
 agord...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] scsi: pm8001: pm80xx_hwi.c: Cleaning up variable is set more than once

2014-06-26 Thread Jack Wang
Thanks Rickard,

>From my point of view, looks good, but I'd like to get review from Anand
(cc-ed).

Anand, could you share your opinion?

Regards,
Jack

On 06/25/2014 04:01 PM, Rickard Strandqvist wrote:
> A struct member variable is set to different values without having used in 
> between.
> 
> This was found using a static code analysis program called cppcheck
> 
> Signed-off-by: Rickard Strandqvist 
> ---
>  drivers/scsi/pm8001/pm80xx_hwi.c |1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c 
> b/drivers/scsi/pm8001/pm80xx_hwi.c
> index d70587f..2698227 100644
> --- a/drivers/scsi/pm8001/pm80xx_hwi.c
> +++ b/drivers/scsi/pm8001/pm80xx_hwi.c
> @@ -249,7 +249,6 @@ moreData:
>   sprintf(pm8001_ha->
>   forensic_info.data_buf.direct_data,
>   "%08x ", 4);
> - pm8001_ha->forensic_info.data_buf.read_len = 0x;
>   pm8001_ha->forensic_info.data_buf.direct_len =  0;
>   pm8001_ha->forensic_info.data_buf.direct_offset = 0;
>   pm8001_ha->forensic_info.data_buf.read_len = 0;
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] scsi: pm8001: pm80xx_hwi.c: Cleaning up uninitialized variables

2014-06-26 Thread Jack Wang
Looks good, thanks Rickard.
Acked-by: Jack Wang 
On 06/01/2014 03:13 PM, Rickard Strandqvist wrote:
> There is a risk that the variable will be used without being initialized.
> 
> This was largely found by using a static code analysis program called 
> cppcheck.
> 
> Signed-off-by: Rickard Strandqvist 
> ---
>  drivers/scsi/pm8001/pm80xx_hwi.c |6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c 
> b/drivers/scsi/pm8001/pm80xx_hwi.c
> index d70587f..add019b 100644
> --- a/drivers/scsi/pm8001/pm80xx_hwi.c
> +++ b/drivers/scsi/pm8001/pm80xx_hwi.c
> @@ -948,7 +948,7 @@ static int
>  pm80xx_get_encrypt_info(struct pm8001_hba_info *pm8001_ha)
>  {
>   u32 scratch3_value;
> - int ret;
> + int ret = -1;
>  
>   /* Read encryption status from SCRATCH PAD 3 */
>   scratch3_value = pm8001_cr32(pm8001_ha, 0, MSGU_SCRATCH_PAD_3);
> @@ -982,7 +982,7 @@ pm80xx_get_encrypt_info(struct pm8001_hba_info *pm8001_ha)
>   pm8001_ha->encrypt_info.status = 0x;
>   pm8001_ha->encrypt_info.cipher_mode = 0;
>   pm8001_ha->encrypt_info.sec_mode = 0;
> - return 0;
> + ret = 0;
>   } else if ((scratch3_value & SCRATCH_PAD3_ENC_MASK) ==
>   SCRATCH_PAD3_ENC_DIS_ERR) {
>   pm8001_ha->encrypt_info.status =
> @@ -1004,7 +1004,6 @@ pm80xx_get_encrypt_info(struct pm8001_hba_info 
> *pm8001_ha)
>   scratch3_value, pm8001_ha->encrypt_info.cipher_mode,
>   pm8001_ha->encrypt_info.sec_mode,
>   pm8001_ha->encrypt_info.status));
> - ret = -1;
>   } else if ((scratch3_value & SCRATCH_PAD3_ENC_MASK) ==
>SCRATCH_PAD3_ENC_ENA_ERR) {
>  
> @@ -1028,7 +1027,6 @@ pm80xx_get_encrypt_info(struct pm8001_hba_info 
> *pm8001_ha)
>   scratch3_value, pm8001_ha->encrypt_info.cipher_mode,
>   pm8001_ha->encrypt_info.sec_mode,
>   pm8001_ha->encrypt_info.status));
> - ret = -1;
>   }
>   return ret;
>  }
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] scsi: pm8001: pm80xx_hwi.c: Cleaning up uninitialized variables

2014-06-26 Thread Jack Wang
Looks good, thanks Rickard.
Acked-by: Jack Wang xjtu...@gmail.com
On 06/01/2014 03:13 PM, Rickard Strandqvist wrote:
 There is a risk that the variable will be used without being initialized.
 
 This was largely found by using a static code analysis program called 
 cppcheck.
 
 Signed-off-by: Rickard Strandqvist rickard_strandqv...@spectrumdigital.se
 ---
  drivers/scsi/pm8001/pm80xx_hwi.c |6 ++
  1 file changed, 2 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c 
 b/drivers/scsi/pm8001/pm80xx_hwi.c
 index d70587f..add019b 100644
 --- a/drivers/scsi/pm8001/pm80xx_hwi.c
 +++ b/drivers/scsi/pm8001/pm80xx_hwi.c
 @@ -948,7 +948,7 @@ static int
  pm80xx_get_encrypt_info(struct pm8001_hba_info *pm8001_ha)
  {
   u32 scratch3_value;
 - int ret;
 + int ret = -1;
  
   /* Read encryption status from SCRATCH PAD 3 */
   scratch3_value = pm8001_cr32(pm8001_ha, 0, MSGU_SCRATCH_PAD_3);
 @@ -982,7 +982,7 @@ pm80xx_get_encrypt_info(struct pm8001_hba_info *pm8001_ha)
   pm8001_ha-encrypt_info.status = 0x;
   pm8001_ha-encrypt_info.cipher_mode = 0;
   pm8001_ha-encrypt_info.sec_mode = 0;
 - return 0;
 + ret = 0;
   } else if ((scratch3_value  SCRATCH_PAD3_ENC_MASK) ==
   SCRATCH_PAD3_ENC_DIS_ERR) {
   pm8001_ha-encrypt_info.status =
 @@ -1004,7 +1004,6 @@ pm80xx_get_encrypt_info(struct pm8001_hba_info 
 *pm8001_ha)
   scratch3_value, pm8001_ha-encrypt_info.cipher_mode,
   pm8001_ha-encrypt_info.sec_mode,
   pm8001_ha-encrypt_info.status));
 - ret = -1;
   } else if ((scratch3_value  SCRATCH_PAD3_ENC_MASK) ==
SCRATCH_PAD3_ENC_ENA_ERR) {
  
 @@ -1028,7 +1027,6 @@ pm80xx_get_encrypt_info(struct pm8001_hba_info 
 *pm8001_ha)
   scratch3_value, pm8001_ha-encrypt_info.cipher_mode,
   pm8001_ha-encrypt_info.sec_mode,
   pm8001_ha-encrypt_info.status));
 - ret = -1;
   }
   return ret;
  }
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] scsi: pm8001: pm80xx_hwi.c: Cleaning up variable is set more than once

2014-06-26 Thread Jack Wang
Thanks Rickard,

From my point of view, looks good, but I'd like to get review from Anand
(cc-ed).

Anand, could you share your opinion?

Regards,
Jack

On 06/25/2014 04:01 PM, Rickard Strandqvist wrote:
 A struct member variable is set to different values without having used in 
 between.
 
 This was found using a static code analysis program called cppcheck
 
 Signed-off-by: Rickard Strandqvist rickard_strandqv...@spectrumdigital.se
 ---
  drivers/scsi/pm8001/pm80xx_hwi.c |1 -
  1 file changed, 1 deletion(-)
 
 diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c 
 b/drivers/scsi/pm8001/pm80xx_hwi.c
 index d70587f..2698227 100644
 --- a/drivers/scsi/pm8001/pm80xx_hwi.c
 +++ b/drivers/scsi/pm8001/pm80xx_hwi.c
 @@ -249,7 +249,6 @@ moreData:
   sprintf(pm8001_ha-
   forensic_info.data_buf.direct_data,
   %08x , 4);
 - pm8001_ha-forensic_info.data_buf.read_len = 0x;
   pm8001_ha-forensic_info.data_buf.direct_len =  0;
   pm8001_ha-forensic_info.data_buf.direct_offset = 0;
   pm8001_ha-forensic_info.data_buf.read_len = 0;
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 17/22] pm8001: Use pci_enable_msix_range()

2014-02-04 Thread Jack Wang
On 02/04/2014 12:17 PM, Alexander Gordeev wrote:
> As result of deprecation of MSI-X/MSI enablement functions
> pci_enable_msix() and pci_enable_msi_block() all drivers
> using these two interfaces need to be updated to use the
> new pci_enable_msi_range() and pci_enable_msix_range()
> interfaces.
> 
> Signed-off-by: Alexander Gordeev 
> Cc: xjtu...@gmail.com
> Cc: lindar_...@usish.com
> Cc: linux-s...@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> ---
>  drivers/scsi/pm8001/pm8001_init.c |   47 
> +++--
>  1 files changed, 24 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/scsi/pm8001/pm8001_init.c 
> b/drivers/scsi/pm8001/pm8001_init.c
> index efffbb9..2d4b06e 100644
> --- a/drivers/scsi/pm8001/pm8001_init.c
> +++ b/drivers/scsi/pm8001/pm8001_init.c
> @@ -724,34 +724,35 @@ static u32 pm8001_setup_msix(struct pm8001_hba_info 
> *pm8001_ha)
>   sizeof(pm8001_ha->msix_entries[0]);
>   for (i = 0; i < max_entry ; i++)
>   pm8001_ha->msix_entries[i].entry = i;
> - rc = pci_enable_msix(pm8001_ha->pdev, pm8001_ha->msix_entries,
> - number_of_intr);
> + rc = pci_enable_msix_range(pm8001_ha->pdev, pm8001_ha->msix_entries,
> + number_of_intr, number_of_intr);
>   pm8001_ha->number_of_intr = number_of_intr;
> - if (!rc) {
> - PM8001_INIT_DBG(pm8001_ha, pm8001_printk(
> - "pci_enable_msix request ret:%d no of intr %d\n",
> - rc, pm8001_ha->number_of_intr));
> -
> -
> - for (i = 0; i < number_of_intr; i++) {
> - snprintf(intr_drvname[i], sizeof(intr_drvname[0]),
> - DRV_NAME"%d", i);
> - pm8001_ha->irq_vector[i].irq_id = i;
> - pm8001_ha->irq_vector[i].drv_inst = pm8001_ha;
> -
> - rc = request_irq(pm8001_ha->msix_entries[i].vector,
> - pm8001_interrupt_handler_msix, flag,
> - intr_drvname[i], &(pm8001_ha->irq_vector[i]));
> - if (rc) {
> - for (j = 0; j < i; j++)
> - free_irq(
> - pm8001_ha->msix_entries[j].vector,
> + if (rc < 0)
> + return rc;
> +
> + PM8001_INIT_DBG(pm8001_ha, pm8001_printk(
> + "pci_enable_msix request ret:%d no of intr %d\n",
> + rc, pm8001_ha->number_of_intr));
> +
> + for (i = 0; i < number_of_intr; i++) {
> + snprintf(intr_drvname[i], sizeof(intr_drvname[0]),
> + DRV_NAME"%d", i);
> + pm8001_ha->irq_vector[i].irq_id = i;
> + pm8001_ha->irq_vector[i].drv_inst = pm8001_ha;
> +
> + rc = request_irq(pm8001_ha->msix_entries[i].vector,
> + pm8001_interrupt_handler_msix, flag,
> + intr_drvname[i], &(pm8001_ha->irq_vector[i]));
> + if (rc) {
> + for (j = 0; j < i; j++) {
> + free_irq(pm8001_ha->msix_entries[j].vector,
>   &(pm8001_ha->irq_vector[i]));
> - pci_disable_msix(pm8001_ha->pdev);
> - break;
>   }
> + pci_disable_msix(pm8001_ha->pdev);
> + break;
>   }
>   }
> +
>   return rc;
>  }
>  #endif
> 
Thanks, looks fine.
Acked-by: Jack Wang 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 16/22] pm8001: Fix invalid success return when request_irq() failed

2014-02-04 Thread Jack Wang
On 02/04/2014 12:17 PM, Alexander Gordeev wrote:
> When enabling MSI-X if a call to request_irq() failed
> pm8001_setup_msix() still returns success. This udate
> fixes the described misbehaviour.
> 
> Signed-off-by: Alexander Gordeev 
> Cc: xjtu...@gmail.com
> Cc: lindar_...@usish.com
> Cc: linux-s...@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> ---
>  drivers/scsi/pm8001/pm8001_init.c |5 +++--
>  1 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/pm8001/pm8001_init.c 
> b/drivers/scsi/pm8001/pm8001_init.c
> index 73a120d..efffbb9 100644
> --- a/drivers/scsi/pm8001/pm8001_init.c
> +++ b/drivers/scsi/pm8001/pm8001_init.c
> @@ -739,9 +739,10 @@ static u32 pm8001_setup_msix(struct pm8001_hba_info 
> *pm8001_ha)
>   pm8001_ha->irq_vector[i].irq_id = i;
>   pm8001_ha->irq_vector[i].drv_inst = pm8001_ha;
>  
> - if (request_irq(pm8001_ha->msix_entries[i].vector,
> + rc = request_irq(pm8001_ha->msix_entries[i].vector,
>   pm8001_interrupt_handler_msix, flag,
> - intr_drvname[i], &(pm8001_ha->irq_vector[i]))) {
> + intr_drvname[i], &(pm8001_ha->irq_vector[i]));
> + if (rc) {
>   for (j = 0; j < i; j++)
>   free_irq(
>   pm8001_ha->msix_entries[j].vector,
> 
Thanks, looks fine.
Acked-by: Jack Wang 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 16/22] pm8001: Fix invalid success return when request_irq() failed

2014-02-04 Thread Jack Wang
On 02/04/2014 12:17 PM, Alexander Gordeev wrote:
 When enabling MSI-X if a call to request_irq() failed
 pm8001_setup_msix() still returns success. This udate
 fixes the described misbehaviour.
 
 Signed-off-by: Alexander Gordeev agord...@redhat.com
 Cc: xjtu...@gmail.com
 Cc: lindar_...@usish.com
 Cc: linux-s...@vger.kernel.org
 Cc: linux-...@vger.kernel.org
 ---
  drivers/scsi/pm8001/pm8001_init.c |5 +++--
  1 files changed, 3 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/scsi/pm8001/pm8001_init.c 
 b/drivers/scsi/pm8001/pm8001_init.c
 index 73a120d..efffbb9 100644
 --- a/drivers/scsi/pm8001/pm8001_init.c
 +++ b/drivers/scsi/pm8001/pm8001_init.c
 @@ -739,9 +739,10 @@ static u32 pm8001_setup_msix(struct pm8001_hba_info 
 *pm8001_ha)
   pm8001_ha-irq_vector[i].irq_id = i;
   pm8001_ha-irq_vector[i].drv_inst = pm8001_ha;
  
 - if (request_irq(pm8001_ha-msix_entries[i].vector,
 + rc = request_irq(pm8001_ha-msix_entries[i].vector,
   pm8001_interrupt_handler_msix, flag,
 - intr_drvname[i], (pm8001_ha-irq_vector[i]))) {
 + intr_drvname[i], (pm8001_ha-irq_vector[i]));
 + if (rc) {
   for (j = 0; j  i; j++)
   free_irq(
   pm8001_ha-msix_entries[j].vector,
 
Thanks, looks fine.
Acked-by: Jack Wang xjtu...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 17/22] pm8001: Use pci_enable_msix_range()

2014-02-04 Thread Jack Wang
On 02/04/2014 12:17 PM, Alexander Gordeev wrote:
 As result of deprecation of MSI-X/MSI enablement functions
 pci_enable_msix() and pci_enable_msi_block() all drivers
 using these two interfaces need to be updated to use the
 new pci_enable_msi_range() and pci_enable_msix_range()
 interfaces.
 
 Signed-off-by: Alexander Gordeev agord...@redhat.com
 Cc: xjtu...@gmail.com
 Cc: lindar_...@usish.com
 Cc: linux-s...@vger.kernel.org
 Cc: linux-...@vger.kernel.org
 ---
  drivers/scsi/pm8001/pm8001_init.c |   47 
 +++--
  1 files changed, 24 insertions(+), 23 deletions(-)
 
 diff --git a/drivers/scsi/pm8001/pm8001_init.c 
 b/drivers/scsi/pm8001/pm8001_init.c
 index efffbb9..2d4b06e 100644
 --- a/drivers/scsi/pm8001/pm8001_init.c
 +++ b/drivers/scsi/pm8001/pm8001_init.c
 @@ -724,34 +724,35 @@ static u32 pm8001_setup_msix(struct pm8001_hba_info 
 *pm8001_ha)
   sizeof(pm8001_ha-msix_entries[0]);
   for (i = 0; i  max_entry ; i++)
   pm8001_ha-msix_entries[i].entry = i;
 - rc = pci_enable_msix(pm8001_ha-pdev, pm8001_ha-msix_entries,
 - number_of_intr);
 + rc = pci_enable_msix_range(pm8001_ha-pdev, pm8001_ha-msix_entries,
 + number_of_intr, number_of_intr);
   pm8001_ha-number_of_intr = number_of_intr;
 - if (!rc) {
 - PM8001_INIT_DBG(pm8001_ha, pm8001_printk(
 - pci_enable_msix request ret:%d no of intr %d\n,
 - rc, pm8001_ha-number_of_intr));
 -
 -
 - for (i = 0; i  number_of_intr; i++) {
 - snprintf(intr_drvname[i], sizeof(intr_drvname[0]),
 - DRV_NAME%d, i);
 - pm8001_ha-irq_vector[i].irq_id = i;
 - pm8001_ha-irq_vector[i].drv_inst = pm8001_ha;
 -
 - rc = request_irq(pm8001_ha-msix_entries[i].vector,
 - pm8001_interrupt_handler_msix, flag,
 - intr_drvname[i], (pm8001_ha-irq_vector[i]));
 - if (rc) {
 - for (j = 0; j  i; j++)
 - free_irq(
 - pm8001_ha-msix_entries[j].vector,
 + if (rc  0)
 + return rc;
 +
 + PM8001_INIT_DBG(pm8001_ha, pm8001_printk(
 + pci_enable_msix request ret:%d no of intr %d\n,
 + rc, pm8001_ha-number_of_intr));
 +
 + for (i = 0; i  number_of_intr; i++) {
 + snprintf(intr_drvname[i], sizeof(intr_drvname[0]),
 + DRV_NAME%d, i);
 + pm8001_ha-irq_vector[i].irq_id = i;
 + pm8001_ha-irq_vector[i].drv_inst = pm8001_ha;
 +
 + rc = request_irq(pm8001_ha-msix_entries[i].vector,
 + pm8001_interrupt_handler_msix, flag,
 + intr_drvname[i], (pm8001_ha-irq_vector[i]));
 + if (rc) {
 + for (j = 0; j  i; j++) {
 + free_irq(pm8001_ha-msix_entries[j].vector,
   (pm8001_ha-irq_vector[i]));
 - pci_disable_msix(pm8001_ha-pdev);
 - break;
   }
 + pci_disable_msix(pm8001_ha-pdev);
 + break;
   }
   }
 +
   return rc;
  }
  #endif
 
Thanks, looks fine.
Acked-by: Jack Wang xjtu...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [STABLE] find missing bug fixes in a stable kernel

2014-01-13 Thread Jack Wang
On 01/13/2014 08:28 AM, Li Zefan wrote:
> We have several long-term and extended stable kernels, and it's possible
> that a bug fix is in some stable versions but is missing in some other
> versions, so I've written a script to find out those fixes.
> 
> Take 3.4.xx and 3.2.xx for example. If a bug fix was merged into upstream
> kernel after 3.4, and then it was backported to 3.2.xx, then it probably
> needs to be backported to 3.4.xx.
> 
> The result is, there're ~430 bug fixes in 3.2.xx that probably need to be
> backported to 3.4.xx. Given there're about 4500 commits in 3.2.xx, that
> is ~10%, which is quite a big number for stable kernels.
> 
> We (our team in Huawei) are going to go through the whole list to filter
> out fixes that're applicable for 3.4.xx.
> 
> I've attached the lists for 3.4 and 3.10.
> 
> If a commit ID appears more than once in changelogs, it's possible that's
> because the commit was reverted later, so I tagged this kind of commits
> in the lists.
> 
Hello Zefan,

Thanks for share, great job, it's very useful info for other companies
who build their kernel base on long term kernel like here in ProfitBricks.

I'm a little confused about the column occurrences, what do the 2
numbers mean, eg :
8c4f3c3fa968 874d3954a35c 2 1
the first is the occurrences in upstream and second for the occurrence
in 3.2.xx right?

Regards,
Jack



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [STABLE] find missing bug fixes in a stable kernel

2014-01-13 Thread Jack Wang
On 01/13/2014 08:28 AM, Li Zefan wrote:
 We have several long-term and extended stable kernels, and it's possible
 that a bug fix is in some stable versions but is missing in some other
 versions, so I've written a script to find out those fixes.
 
 Take 3.4.xx and 3.2.xx for example. If a bug fix was merged into upstream
 kernel after 3.4, and then it was backported to 3.2.xx, then it probably
 needs to be backported to 3.4.xx.
 
 The result is, there're ~430 bug fixes in 3.2.xx that probably need to be
 backported to 3.4.xx. Given there're about 4500 commits in 3.2.xx, that
 is ~10%, which is quite a big number for stable kernels.
 
 We (our team in Huawei) are going to go through the whole list to filter
 out fixes that're applicable for 3.4.xx.
 
 I've attached the lists for 3.4 and 3.10.
 
 If a commit ID appears more than once in changelogs, it's possible that's
 because the commit was reverted later, so I tagged this kind of commits
 in the lists.
 
Hello Zefan,

Thanks for share, great job, it's very useful info for other companies
who build their kernel base on long term kernel like here in ProfitBricks.

I'm a little confused about the column occurrences, what do the 2
numbers mean, eg :
8c4f3c3fa968 874d3954a35c 2 1
the first is the occurrences in upstream and second for the occurrence
in 3.2.xx right?

Regards,
Jack



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG]NULL pointer dereference at 0000000000000008 __blkdev_put+0x17f/0x1d0

2014-01-06 Thread Jack Wang
On 01/04/2014 07:09 AM, Al Viro wrote:
> On Thu, Jan 02, 2014 at 10:36:30AM +0100, Jack Wang wrote:
> 
>>> Bug happened at line 1486, looks disk->fops is NULL here for some
>>> reason, is it reasonable to add a check like:
>>>
>>> if (disk->fops)
>>> if (disk->fops->release)
>>> ret = disk->fops->release(disk, mode);
>>>
>>>
>>> Happy New Year and Best regards:)
>>> Jack
>>>
>>
>> Ping, could you share opnions on this, attached with patch I proposaled.
> 
> Sorry, had been sick since mid-December ;-/  The patch is not a good idea -
> in the best case it's papering over a bug (and insufficiently so, at that,
> since there are other places where disk->fops->some_method is checked).
> 
> gendisk->fops should never be assigned NULL; it starts life with NULL
> ->fops, but that should be assigned a non-NULL value (and never modified
> afterwards) before anyone can see it.  Moreover, even if some driver has
> fscked up and forgot to initialize the damn thing, get_gendisk() would've
> refused to return such a thing to any callers (including __blkdev_get()).
> Note that __blkdev_get() would oops on such a thing if get_gendisk()
> somehow returned it.
> 
> Looks like something is shitting over bdev->bd_disk or bdev->bd_disk->fops.
> The offsets in the disassembled code are all wrong (including that from
> beginning of function to oopsing instruction), but the code match is good,
> so I agree that we are hitting bdev->bd_disk->fops == NULL here.  The
> question is how it has happened - that's where the real bug is...
> 
> How reproducible it is?  And which kernel, while we are at it?  This area
> didn't get a lot of changes lately, but still...
> 
Thanks Al for reply, and look into this.
We're using 3.4.71, and this happened in production, we can not
reproduce it yet. What I could see is: before this happened, we saw scsi
devices offlined, and multipath failed path, raid1 failed member device.

Possible the bug lies in drivers md-raid1, dm-multipath or sd? How could
I narrow it down? Could you teach me?

Thanks, wish you happy and healthy!

Jack
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG]NULL pointer dereference at 0000000000000008 __blkdev_put+0x17f/0x1d0

2014-01-06 Thread Jack Wang
On 01/04/2014 07:09 AM, Al Viro wrote:
 On Thu, Jan 02, 2014 at 10:36:30AM +0100, Jack Wang wrote:
 
 Bug happened at line 1486, looks disk-fops is NULL here for some
 reason, is it reasonable to add a check like:

 if (disk-fops)
 if (disk-fops-release)
 ret = disk-fops-release(disk, mode);


 Happy New Year and Best regards:)
 Jack


 Ping, could you share opnions on this, attached with patch I proposaled.
 
 Sorry, had been sick since mid-December ;-/  The patch is not a good idea -
 in the best case it's papering over a bug (and insufficiently so, at that,
 since there are other places where disk-fops-some_method is checked).
 
 gendisk-fops should never be assigned NULL; it starts life with NULL
 -fops, but that should be assigned a non-NULL value (and never modified
 afterwards) before anyone can see it.  Moreover, even if some driver has
 fscked up and forgot to initialize the damn thing, get_gendisk() would've
 refused to return such a thing to any callers (including __blkdev_get()).
 Note that __blkdev_get() would oops on such a thing if get_gendisk()
 somehow returned it.
 
 Looks like something is shitting over bdev-bd_disk or bdev-bd_disk-fops.
 The offsets in the disassembled code are all wrong (including that from
 beginning of function to oopsing instruction), but the code match is good,
 so I agree that we are hitting bdev-bd_disk-fops == NULL here.  The
 question is how it has happened - that's where the real bug is...
 
 How reproducible it is?  And which kernel, while we are at it?  This area
 didn't get a lot of changes lately, but still...
 
Thanks Al for reply, and look into this.
We're using 3.4.71, and this happened in production, we can not
reproduce it yet. What I could see is: before this happened, we saw scsi
devices offlined, and multipath failed path, raid1 failed member device.

Possible the bug lies in drivers md-raid1, dm-multipath or sd? How could
I narrow it down? Could you teach me?

Thanks, wish you happy and healthy!

Jack
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG]NULL pointer dereference at 0000000000000008 __blkdev_put+0x17f/0x1d0

2014-01-02 Thread Jack Wang
On 12/30/2013 04:55 PM, Jack Wang wrote:
> Hi,
> 
> We saw NULL pointer dereference below:
> 
> Dec 28 16:24:26 server kernel: [979193.076399] BUG: unable to handle
> kernel NULL pointer dereference at 0008
> Dec 28 16:24:26 server kernel: [979193.076401] IP: []
> __blkdev_put+0x17f/0x1d0
> Dec 28 16:24:26 server kernel: [979193.076408] PGD 4bdcaa067 PUD
> 4bdc43067 PMD 0
> Dec 28 16:24:26 server kernel: [979193.076410] Oops:  [#1] SMP
> Dec 28 16:24:26 server kernel: [979193.076412] CPU 6
> Dec 28 16:24:26 server kernel: [979193.076413] Modules linked in: bridge
> stp llc nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
> raid1 md_mod dm_round_robin sd_mod crc_t10dif ib_srp scsi_transport_srp
> scsi_tgt xt_ETHOIP6(O) x_tables vhost_net(O) macvtap macvlan tun(O)
> nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 rdma_ucm rdma_cm iw_cm
> ib_addr ib_ipoib ib_cm ib_sa ib_uverbs ib_umad ib_qib mlx4_ib ib_mthca
> ib_mad ib_core dm_multipath scsi_dh scsi_mod kvm_amd kvm powernow_k8
> mperf psmouse crc32c_intel microcode tpm_tis tpm tpm_bios serio_raw
> evdev amd64_edac_mod edac_core edac_mce_amd i2c_piix4 button processor
> thermal_sys mlx4_core
> Dec 28 16:24:26 server kernel: [979193.076440]
> Dec 28 16:24:26 server kernel: [979193.076442] Pid: 56544, comm:
> multipath Tainted: G   O 3.4.71-3-pserver #1 Supermicro BHQGE/BHQGE
> Dec 28 16:24:26 server kernel: [979193.076445] RIP:
> 0010:[]  [] __blkdev_put+0x17f/0x1d0
> Dec 28 16:24:26 server kernel: [979193.076448] RSP:
> 0018:882802f4beb8  EFLAGS: 00010246
> Dec 28 16:24:26 server kernel: [979193.076449] RAX: 
> RBX: 881ff78b0d00 RCX: 0001
> Dec 28 16:24:26 server kernel: [979193.076451] RDX: 
> RSI: 001d RDI: 881ff78b0d18
> Dec 28 16:24:26 server kernel: [979193.076452] RBP: 
> R08:  R09: 
> Dec 28 16:24:26 server kernel: [979193.076453] R10: 
> R11: 0246 R12: 001d
> Dec 28 16:24:26 server kernel: [979193.076455] R13: 881ff78b0d18
> R14: 8807f9e7f400 R15: 8804a8d77710
> Dec 28 16:24:26 server kernel: [979193.076457] FS:
> 7ff8c80fe7a0() GS:880807d8() knlGS:
> Dec 28 16:24:26 server kernel: [979193.076458] CS:  0010 DS:  ES:
>  CR0: 80050033
> Dec 28 16:24:26 server kernel: [979193.076460] CR2: 0008
> CR3: 00064765f000 CR4: 000407e0
> Dec 28 16:24:26 server kernel: [979193.076461] DR0: 
> DR1:  DR2: 
> Dec 28 16:24:26 server kernel: [979193.076463] DR3: 
> DR6: 0ff0 DR7: 0400
> Dec 28 16:24:26 server kernel: [979193.076464] Process multipath (pid:
> 56544, threadinfo 882802f4a000, task 8828020106d0)
> Dec 28 16:24:26 server kernel: [979193.076466] Stack:
> Dec 28 16:24:26 server kernel: [979193.076466]  
>  880803cf2580 8804a8d77700
> Dec 28 16:24:26 server kernel: [979193.076468]  0010
> 88100363eff0 881004609b00 882003c20020
> Dec 28 16:24:26 server kernel: [979193.076470]  8804a8d77710
> 81136bad 7fffbdc8f420 8804a8d77700
> Dec 28 16:24:26 server kernel: [979193.076472] Call Trace:
> Dec 28 16:24:26 server kernel: [979193.076477]  [] ?
> fput+0xdd/0x270
> Dec 28 16:24:26 server kernel: [979193.076479]  [] ?
> filp_close+0x5c/0x90
> Dec 28 16:24:26 server kernel: [979193.076481]  [] ?
> sys_close+0x71/0xc0
> Dec 28 16:24:26 server kernel: [979193.076484]  [] ?
> system_call_fastpath+0x16/0x1b
> Dec 28 16:24:26 server kernel: [979193.076486] Code: 8b 5c 24 18 48 8b
> 6c 24 20 4c 8b 64 24 28 4c 8b 6c 24 30 4c 8b 74 24 38 4c 8b 7c 24 40 48
> 83 c4 48 c3 66 90 49 8b 86 48 03 00 00 <48> 8b 40 08 48 85 c0 0f 84 fc
> fe ff ff 44 89 e6 4c 89 f7 ff d0
> Dec 28 16:24:26 server kernel: [979193.076500] RIP  []
> __blkdev_put+0x17f/0x1d0
> Dec 28 16:24:26 server kernel: [979193.076503]  RSP 
> Dec 28 16:24:26 server kernel: [979193.076504] CR2: 0008
> Dec 28 16:24:26 server kernel: [979193.077599] ---[ end trace
> 23f39da823d257f9 ]---
> 
> disassamble results show:
> 1465  static int __blkdev_put(struct block_device *bdev, fmode_t mode,
> int for_part)
> 1466  {
>0x81162d10 <+0>:   sub$0x48,%rsp
>0x81162d14 <+4>:   mov%r13,0x30(%rsp)
>0x81162d1d <+13>:  mov%rbx,0x18(%rsp)
>0x81162d22 <+18>:  mov%rbp,0x20(%rsp)
>0x81162d27 <+23>:  mov%r12,0x28(%rsp)
>0x81162d2c <+28>:  mov%edx,%ebp
>0x81162d2e <

Re: [BUG]NULL pointer dereference at 0000000000000008 __blkdev_put+0x17f/0x1d0

2014-01-02 Thread Jack Wang
On 12/30/2013 04:55 PM, Jack Wang wrote:
 Hi,
 
 We saw NULL pointer dereference below:
 
 Dec 28 16:24:26 server kernel: [979193.076399] BUG: unable to handle
 kernel NULL pointer dereference at 0008
 Dec 28 16:24:26 server kernel: [979193.076401] IP: [8116952f]
 __blkdev_put+0x17f/0x1d0
 Dec 28 16:24:26 server kernel: [979193.076408] PGD 4bdcaa067 PUD
 4bdc43067 PMD 0
 Dec 28 16:24:26 server kernel: [979193.076410] Oops:  [#1] SMP
 Dec 28 16:24:26 server kernel: [979193.076412] CPU 6
 Dec 28 16:24:26 server kernel: [979193.076413] Modules linked in: bridge
 stp llc nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
 raid1 md_mod dm_round_robin sd_mod crc_t10dif ib_srp scsi_transport_srp
 scsi_tgt xt_ETHOIP6(O) x_tables vhost_net(O) macvtap macvlan tun(O)
 nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 rdma_ucm rdma_cm iw_cm
 ib_addr ib_ipoib ib_cm ib_sa ib_uverbs ib_umad ib_qib mlx4_ib ib_mthca
 ib_mad ib_core dm_multipath scsi_dh scsi_mod kvm_amd kvm powernow_k8
 mperf psmouse crc32c_intel microcode tpm_tis tpm tpm_bios serio_raw
 evdev amd64_edac_mod edac_core edac_mce_amd i2c_piix4 button processor
 thermal_sys mlx4_core
 Dec 28 16:24:26 server kernel: [979193.076440]
 Dec 28 16:24:26 server kernel: [979193.076442] Pid: 56544, comm:
 multipath Tainted: G   O 3.4.71-3-pserver #1 Supermicro BHQGE/BHQGE
 Dec 28 16:24:26 server kernel: [979193.076445] RIP:
 0010:[8116952f]  [8116952f] __blkdev_put+0x17f/0x1d0
 Dec 28 16:24:26 server kernel: [979193.076448] RSP:
 0018:882802f4beb8  EFLAGS: 00010246
 Dec 28 16:24:26 server kernel: [979193.076449] RAX: 
 RBX: 881ff78b0d00 RCX: 0001
 Dec 28 16:24:26 server kernel: [979193.076451] RDX: 
 RSI: 001d RDI: 881ff78b0d18
 Dec 28 16:24:26 server kernel: [979193.076452] RBP: 
 R08:  R09: 
 Dec 28 16:24:26 server kernel: [979193.076453] R10: 
 R11: 0246 R12: 001d
 Dec 28 16:24:26 server kernel: [979193.076455] R13: 881ff78b0d18
 R14: 8807f9e7f400 R15: 8804a8d77710
 Dec 28 16:24:26 server kernel: [979193.076457] FS:
 7ff8c80fe7a0() GS:880807d8() knlGS:
 Dec 28 16:24:26 server kernel: [979193.076458] CS:  0010 DS:  ES:
  CR0: 80050033
 Dec 28 16:24:26 server kernel: [979193.076460] CR2: 0008
 CR3: 00064765f000 CR4: 000407e0
 Dec 28 16:24:26 server kernel: [979193.076461] DR0: 
 DR1:  DR2: 
 Dec 28 16:24:26 server kernel: [979193.076463] DR3: 
 DR6: 0ff0 DR7: 0400
 Dec 28 16:24:26 server kernel: [979193.076464] Process multipath (pid:
 56544, threadinfo 882802f4a000, task 8828020106d0)
 Dec 28 16:24:26 server kernel: [979193.076466] Stack:
 Dec 28 16:24:26 server kernel: [979193.076466]  
  880803cf2580 8804a8d77700
 Dec 28 16:24:26 server kernel: [979193.076468]  0010
 88100363eff0 881004609b00 882003c20020
 Dec 28 16:24:26 server kernel: [979193.076470]  8804a8d77710
 81136bad 7fffbdc8f420 8804a8d77700
 Dec 28 16:24:26 server kernel: [979193.076472] Call Trace:
 Dec 28 16:24:26 server kernel: [979193.076477]  [81136bad] ?
 fput+0xdd/0x270
 Dec 28 16:24:26 server kernel: [979193.076479]  [81132f0c] ?
 filp_close+0x5c/0x90
 Dec 28 16:24:26 server kernel: [979193.076481]  [81132fb1] ?
 sys_close+0x71/0xc0
 Dec 28 16:24:26 server kernel: [979193.076484]  [816801b9] ?
 system_call_fastpath+0x16/0x1b
 Dec 28 16:24:26 server kernel: [979193.076486] Code: 8b 5c 24 18 48 8b
 6c 24 20 4c 8b 64 24 28 4c 8b 6c 24 30 4c 8b 74 24 38 4c 8b 7c 24 40 48
 83 c4 48 c3 66 90 49 8b 86 48 03 00 00 48 8b 40 08 48 85 c0 0f 84 fc
 fe ff ff 44 89 e6 4c 89 f7 ff d0
 Dec 28 16:24:26 server kernel: [979193.076500] RIP  [8116952f]
 __blkdev_put+0x17f/0x1d0
 Dec 28 16:24:26 server kernel: [979193.076503]  RSP 882802f4beb8
 Dec 28 16:24:26 server kernel: [979193.076504] CR2: 0008
 Dec 28 16:24:26 server kernel: [979193.077599] ---[ end trace
 23f39da823d257f9 ]---
 
 disassamble results show:
 1465  static int __blkdev_put(struct block_device *bdev, fmode_t mode,
 int for_part)
 1466  {
0x81162d10 +0:   sub$0x48,%rsp
0x81162d14 +4:   mov%r13,0x30(%rsp)
0x81162d1d +13:  mov%rbx,0x18(%rsp)
0x81162d22 +18:  mov%rbp,0x20(%rsp)
0x81162d27 +23:  mov%r12,0x28(%rsp)
0x81162d2c +28:  mov%edx,%ebp
0x81162d2e +30:  mov%r14,0x38(%rsp)
0x81162d33 +35:  mov%r15,0x40(%rsp)
0x81162d38 +40:  mov%rdi,%rbx
0x81162d45 +53:  mov%esi,%r12d
 
 1467  int ret = 0;
0x81162d8e +126: xor%ebp,%ebp
 
 1468  struct

[BUG]NULL pointer dereference at 0000000000000008 __blkdev_put+0x17f/0x1d0

2013-12-30 Thread Jack Wang
Hi,

We saw NULL pointer dereference below:

Dec 28 16:24:26 server kernel: [979193.076399] BUG: unable to handle
kernel NULL pointer dereference at 0008
Dec 28 16:24:26 server kernel: [979193.076401] IP: []
__blkdev_put+0x17f/0x1d0
Dec 28 16:24:26 server kernel: [979193.076408] PGD 4bdcaa067 PUD
4bdc43067 PMD 0
Dec 28 16:24:26 server kernel: [979193.076410] Oops:  [#1] SMP
Dec 28 16:24:26 server kernel: [979193.076412] CPU 6
Dec 28 16:24:26 server kernel: [979193.076413] Modules linked in: bridge
stp llc nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
raid1 md_mod dm_round_robin sd_mod crc_t10dif ib_srp scsi_transport_srp
scsi_tgt xt_ETHOIP6(O) x_tables vhost_net(O) macvtap macvlan tun(O)
nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 rdma_ucm rdma_cm iw_cm
ib_addr ib_ipoib ib_cm ib_sa ib_uverbs ib_umad ib_qib mlx4_ib ib_mthca
ib_mad ib_core dm_multipath scsi_dh scsi_mod kvm_amd kvm powernow_k8
mperf psmouse crc32c_intel microcode tpm_tis tpm tpm_bios serio_raw
evdev amd64_edac_mod edac_core edac_mce_amd i2c_piix4 button processor
thermal_sys mlx4_core
Dec 28 16:24:26 server kernel: [979193.076440]
Dec 28 16:24:26 server kernel: [979193.076442] Pid: 56544, comm:
multipath Tainted: G   O 3.4.71-3-pserver #1 Supermicro BHQGE/BHQGE
Dec 28 16:24:26 server kernel: [979193.076445] RIP:
0010:[]  [] __blkdev_put+0x17f/0x1d0
Dec 28 16:24:26 server kernel: [979193.076448] RSP:
0018:882802f4beb8  EFLAGS: 00010246
Dec 28 16:24:26 server kernel: [979193.076449] RAX: 
RBX: 881ff78b0d00 RCX: 0001
Dec 28 16:24:26 server kernel: [979193.076451] RDX: 
RSI: 001d RDI: 881ff78b0d18
Dec 28 16:24:26 server kernel: [979193.076452] RBP: 
R08:  R09: 
Dec 28 16:24:26 server kernel: [979193.076453] R10: 
R11: 0246 R12: 001d
Dec 28 16:24:26 server kernel: [979193.076455] R13: 881ff78b0d18
R14: 8807f9e7f400 R15: 8804a8d77710
Dec 28 16:24:26 server kernel: [979193.076457] FS:
7ff8c80fe7a0() GS:880807d8() knlGS:
Dec 28 16:24:26 server kernel: [979193.076458] CS:  0010 DS:  ES:
 CR0: 80050033
Dec 28 16:24:26 server kernel: [979193.076460] CR2: 0008
CR3: 00064765f000 CR4: 000407e0
Dec 28 16:24:26 server kernel: [979193.076461] DR0: 
DR1:  DR2: 
Dec 28 16:24:26 server kernel: [979193.076463] DR3: 
DR6: 0ff0 DR7: 0400
Dec 28 16:24:26 server kernel: [979193.076464] Process multipath (pid:
56544, threadinfo 882802f4a000, task 8828020106d0)
Dec 28 16:24:26 server kernel: [979193.076466] Stack:
Dec 28 16:24:26 server kernel: [979193.076466]  
 880803cf2580 8804a8d77700
Dec 28 16:24:26 server kernel: [979193.076468]  0010
88100363eff0 881004609b00 882003c20020
Dec 28 16:24:26 server kernel: [979193.076470]  8804a8d77710
81136bad 7fffbdc8f420 8804a8d77700
Dec 28 16:24:26 server kernel: [979193.076472] Call Trace:
Dec 28 16:24:26 server kernel: [979193.076477]  [] ?
fput+0xdd/0x270
Dec 28 16:24:26 server kernel: [979193.076479]  [] ?
filp_close+0x5c/0x90
Dec 28 16:24:26 server kernel: [979193.076481]  [] ?
sys_close+0x71/0xc0
Dec 28 16:24:26 server kernel: [979193.076484]  [] ?
system_call_fastpath+0x16/0x1b
Dec 28 16:24:26 server kernel: [979193.076486] Code: 8b 5c 24 18 48 8b
6c 24 20 4c 8b 64 24 28 4c 8b 6c 24 30 4c 8b 74 24 38 4c 8b 7c 24 40 48
83 c4 48 c3 66 90 49 8b 86 48 03 00 00 <48> 8b 40 08 48 85 c0 0f 84 fc
fe ff ff 44 89 e6 4c 89 f7 ff d0
Dec 28 16:24:26 server kernel: [979193.076500] RIP  []
__blkdev_put+0x17f/0x1d0
Dec 28 16:24:26 server kernel: [979193.076503]  RSP 
Dec 28 16:24:26 server kernel: [979193.076504] CR2: 0008
Dec 28 16:24:26 server kernel: [979193.077599] ---[ end trace
23f39da823d257f9 ]---

disassamble results show:
1465static int __blkdev_put(struct block_device *bdev, fmode_t mode,
int for_part)
1466{
   0x81162d10 <+0>: sub$0x48,%rsp
   0x81162d14 <+4>: mov%r13,0x30(%rsp)
   0x81162d1d <+13>:mov%rbx,0x18(%rsp)
   0x81162d22 <+18>:mov%rbp,0x20(%rsp)
   0x81162d27 <+23>:mov%r12,0x28(%rsp)
   0x81162d2c <+28>:mov%edx,%ebp
   0x81162d2e <+30>:mov%r14,0x38(%rsp)
   0x81162d33 <+35>:mov%r15,0x40(%rsp)
   0x81162d38 <+40>:mov%rdi,%rbx
   0x81162d45 <+53>:mov%esi,%r12d

1467int ret = 0;
   0x81162d8e <+126>:   xor%ebp,%ebp

1468struct gendisk *disk = bdev->bd_disk;
   0x81162d3b <+43>:mov0x90(%rdi),%r14

1469struct block_device *victim = NULL;
1470
1471mutex_lock_nested(>bd_mutex, for_part);
   0x81162d19 

[BUG]NULL pointer dereference at 0000000000000008 __blkdev_put+0x17f/0x1d0

2013-12-30 Thread Jack Wang
Hi,

We saw NULL pointer dereference below:

Dec 28 16:24:26 server kernel: [979193.076399] BUG: unable to handle
kernel NULL pointer dereference at 0008
Dec 28 16:24:26 server kernel: [979193.076401] IP: [8116952f]
__blkdev_put+0x17f/0x1d0
Dec 28 16:24:26 server kernel: [979193.076408] PGD 4bdcaa067 PUD
4bdc43067 PMD 0
Dec 28 16:24:26 server kernel: [979193.076410] Oops:  [#1] SMP
Dec 28 16:24:26 server kernel: [979193.076412] CPU 6
Dec 28 16:24:26 server kernel: [979193.076413] Modules linked in: bridge
stp llc nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
raid1 md_mod dm_round_robin sd_mod crc_t10dif ib_srp scsi_transport_srp
scsi_tgt xt_ETHOIP6(O) x_tables vhost_net(O) macvtap macvlan tun(O)
nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 rdma_ucm rdma_cm iw_cm
ib_addr ib_ipoib ib_cm ib_sa ib_uverbs ib_umad ib_qib mlx4_ib ib_mthca
ib_mad ib_core dm_multipath scsi_dh scsi_mod kvm_amd kvm powernow_k8
mperf psmouse crc32c_intel microcode tpm_tis tpm tpm_bios serio_raw
evdev amd64_edac_mod edac_core edac_mce_amd i2c_piix4 button processor
thermal_sys mlx4_core
Dec 28 16:24:26 server kernel: [979193.076440]
Dec 28 16:24:26 server kernel: [979193.076442] Pid: 56544, comm:
multipath Tainted: G   O 3.4.71-3-pserver #1 Supermicro BHQGE/BHQGE
Dec 28 16:24:26 server kernel: [979193.076445] RIP:
0010:[8116952f]  [8116952f] __blkdev_put+0x17f/0x1d0
Dec 28 16:24:26 server kernel: [979193.076448] RSP:
0018:882802f4beb8  EFLAGS: 00010246
Dec 28 16:24:26 server kernel: [979193.076449] RAX: 
RBX: 881ff78b0d00 RCX: 0001
Dec 28 16:24:26 server kernel: [979193.076451] RDX: 
RSI: 001d RDI: 881ff78b0d18
Dec 28 16:24:26 server kernel: [979193.076452] RBP: 
R08:  R09: 
Dec 28 16:24:26 server kernel: [979193.076453] R10: 
R11: 0246 R12: 001d
Dec 28 16:24:26 server kernel: [979193.076455] R13: 881ff78b0d18
R14: 8807f9e7f400 R15: 8804a8d77710
Dec 28 16:24:26 server kernel: [979193.076457] FS:
7ff8c80fe7a0() GS:880807d8() knlGS:
Dec 28 16:24:26 server kernel: [979193.076458] CS:  0010 DS:  ES:
 CR0: 80050033
Dec 28 16:24:26 server kernel: [979193.076460] CR2: 0008
CR3: 00064765f000 CR4: 000407e0
Dec 28 16:24:26 server kernel: [979193.076461] DR0: 
DR1:  DR2: 
Dec 28 16:24:26 server kernel: [979193.076463] DR3: 
DR6: 0ff0 DR7: 0400
Dec 28 16:24:26 server kernel: [979193.076464] Process multipath (pid:
56544, threadinfo 882802f4a000, task 8828020106d0)
Dec 28 16:24:26 server kernel: [979193.076466] Stack:
Dec 28 16:24:26 server kernel: [979193.076466]  
 880803cf2580 8804a8d77700
Dec 28 16:24:26 server kernel: [979193.076468]  0010
88100363eff0 881004609b00 882003c20020
Dec 28 16:24:26 server kernel: [979193.076470]  8804a8d77710
81136bad 7fffbdc8f420 8804a8d77700
Dec 28 16:24:26 server kernel: [979193.076472] Call Trace:
Dec 28 16:24:26 server kernel: [979193.076477]  [81136bad] ?
fput+0xdd/0x270
Dec 28 16:24:26 server kernel: [979193.076479]  [81132f0c] ?
filp_close+0x5c/0x90
Dec 28 16:24:26 server kernel: [979193.076481]  [81132fb1] ?
sys_close+0x71/0xc0
Dec 28 16:24:26 server kernel: [979193.076484]  [816801b9] ?
system_call_fastpath+0x16/0x1b
Dec 28 16:24:26 server kernel: [979193.076486] Code: 8b 5c 24 18 48 8b
6c 24 20 4c 8b 64 24 28 4c 8b 6c 24 30 4c 8b 74 24 38 4c 8b 7c 24 40 48
83 c4 48 c3 66 90 49 8b 86 48 03 00 00 48 8b 40 08 48 85 c0 0f 84 fc
fe ff ff 44 89 e6 4c 89 f7 ff d0
Dec 28 16:24:26 server kernel: [979193.076500] RIP  [8116952f]
__blkdev_put+0x17f/0x1d0
Dec 28 16:24:26 server kernel: [979193.076503]  RSP 882802f4beb8
Dec 28 16:24:26 server kernel: [979193.076504] CR2: 0008
Dec 28 16:24:26 server kernel: [979193.077599] ---[ end trace
23f39da823d257f9 ]---

disassamble results show:
1465static int __blkdev_put(struct block_device *bdev, fmode_t mode,
int for_part)
1466{
   0x81162d10 +0: sub$0x48,%rsp
   0x81162d14 +4: mov%r13,0x30(%rsp)
   0x81162d1d +13:mov%rbx,0x18(%rsp)
   0x81162d22 +18:mov%rbp,0x20(%rsp)
   0x81162d27 +23:mov%r12,0x28(%rsp)
   0x81162d2c +28:mov%edx,%ebp
   0x81162d2e +30:mov%r14,0x38(%rsp)
   0x81162d33 +35:mov%r15,0x40(%rsp)
   0x81162d38 +40:mov%rdi,%rbx
   0x81162d45 +53:mov%esi,%r12d

1467int ret = 0;
   0x81162d8e +126:   xor%ebp,%ebp

1468struct gendisk *disk = bdev-bd_disk;
   0x81162d3b +43:mov0x90(%rdi),%r14

1469struct 

Re: [PATCH 092/104] mm: fix aio performance regression for database caused by THP

2013-09-30 Thread Jack Wang
On 09/30/2013 12:11 PM, Luis Henriques wrote:
> 3.5.7.22 -stable review patch.  If anyone has any objections, please let me 
> know.
> 
> --
> 
> From: Khalid Aziz 
> 
> commit 7cb2ef56e6a8b7b368b2e883a0a47d02fed66911 upstream.
> 
> I am working with a tool that simulates oracle database I/O workload.
> This tool (orion to be specific -
> )
> allocates hugetlbfs pages using shmget() with SHM_HUGETLB flag.  It then
> does aio into these pages from flash disks using various common block
> sizes used by database.  I am looking at performance with two of the most
> common block sizes - 1M and 64K.  aio performance with these two block
> sizes plunged after Transparent HugePages was introduced in the kernel.
> Here are performance numbers:
> 
>   pre-THP 2.6.39  3.11-rc5
> 1M read   8384 MB/s   5629 MB/s   6501 MB/s
> 64K read  7867 MB/s   4576 MB/s   4251 MB/s
> 
> I have narrowed the performance impact down to the overheads introduced by
> THP in __get_page_tail() and put_compound_page() routines.  perf top shows
>> 40% of cycles being spent in these two routines.  Every time direct I/O
> to hugetlbfs pages starts, kernel calls get_page() to grab a reference to
> the pages and calls put_page() when I/O completes to put the reference
> away.  THP introduced significant amount of locking overhead to get_page()
> and put_page() when dealing with compound pages because hugepages can be
> split underneath get_page() and put_page().  It added this overhead
> irrespective of whether it is dealing with hugetlbfs pages or transparent
> hugepages.  This resulted in 20%-45% drop in aio performance when using
> hugetlbfs pages.
> 
> Since hugetlbfs pages can not be split, there is no reason to go through
> all the locking overhead for these pages from what I can see.  I added
> code to __get_page_tail() and put_compound_page() to bypass all the
> locking code when working with hugetlbfs pages.  This improved performance
> significantly.  Performance numbers with this patch:
> 
>   pre-THP 3.11-rc53.11-rc5 + Patch
> 1M read   8384 MB/s   6501 MB/s   8371 MB/s
> 64K read  7867 MB/s   4251 MB/s   6510 MB/s
> 
> Performance with 64K read is still lower than what it was before THP, but
> still a 53% improvement.  It does mean there is more work to be done but I
> will take a 53% improvement for now.
> 
> Please take a look at the following patch and let me know if it looks
> reasonable.
> 
> [a...@linux-foundation.org: tweak comments]
> Signed-off-by: Khalid Aziz 
> Cc: Pravin B Shelar 
> Cc: Christoph Lameter 
> Cc: Andrea Arcangeli 
> Cc: Johannes Weiner 
> Cc: Mel Gorman 
> Cc: Rik van Riel 
> Cc: Minchan Kim 
> Cc: Andi Kleen 
> Signed-off-by: Andrew Morton 
> Signed-off-by: Linus Torvalds 
> [ luis: backported to 3.5: adjusted context ]
> Signed-off-by: Luis Henriques 
Hi Greg,

I suppose this patch also needed for 3.4, right?

Regards,
Jack


> ---
>  mm/swap.c | 77 
> ++-
>  1 file changed, 52 insertions(+), 25 deletions(-)
> 
> diff --git a/mm/swap.c b/mm/swap.c
> index 4e7e2ec..0c833e8 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -30,6 +30,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "internal.h"
>  
> @@ -77,6 +78,19 @@ static void __put_compound_page(struct page *page)
>  
>  static void put_compound_page(struct page *page)
>  {
> + /*
> +  * hugetlbfs pages cannot be split from under us.  If this is a
> +  * hugetlbfs page, check refcount on head page and release the page if
> +  * the refcount becomes zero.
> +  */
> + if (PageHuge(page)) {
> + page = compound_head(page);
> + if (put_page_testzero(page))
> + __put_compound_page(page);
> +
> + return;
> + }
> +
>   if (unlikely(PageTail(page))) {
>   /* __split_huge_page_refcount can run under us */
>   struct page *page_head = compound_trans_head(page);
> @@ -180,38 +194,51 @@ bool __get_page_tail(struct page *page)
>* proper PT lock that already serializes against
>* split_huge_page().
>*/
> - unsigned long flags;
>   bool got = false;
> - struct page *page_head = compound_trans_head(page);
> + struct page *page_head;
>  
> - if (likely(page != page_head && get_page_unless_zero(page_head))) {
> + /*
> +  * If this is a hugetlbfs page it cannot be split under us.  Simply
> +  * increment refcount for the head page.
> +  */
> + if (PageHuge(page)) {
> + page_head = compound_head(page);
> + atomic_inc(_head->_count);
> + got = true;
> + } else {
> + unsigned long flags;
> +
> + page_head = compound_trans_head(page);
> +

Re: [PATCH 092/104] mm: fix aio performance regression for database caused by THP

2013-09-30 Thread Jack Wang
On 09/30/2013 12:11 PM, Luis Henriques wrote:
 3.5.7.22 -stable review patch.  If anyone has any objections, please let me 
 know.
 
 --
 
 From: Khalid Aziz khalid.a...@oracle.com
 
 commit 7cb2ef56e6a8b7b368b2e883a0a47d02fed66911 upstream.
 
 I am working with a tool that simulates oracle database I/O workload.
 This tool (orion to be specific -
 http://docs.oracle.com/cd/E11882_01/server.112/e16638/iodesign.htm#autoId24)
 allocates hugetlbfs pages using shmget() with SHM_HUGETLB flag.  It then
 does aio into these pages from flash disks using various common block
 sizes used by database.  I am looking at performance with two of the most
 common block sizes - 1M and 64K.  aio performance with these two block
 sizes plunged after Transparent HugePages was introduced in the kernel.
 Here are performance numbers:
 
   pre-THP 2.6.39  3.11-rc5
 1M read   8384 MB/s   5629 MB/s   6501 MB/s
 64K read  7867 MB/s   4576 MB/s   4251 MB/s
 
 I have narrowed the performance impact down to the overheads introduced by
 THP in __get_page_tail() and put_compound_page() routines.  perf top shows
 40% of cycles being spent in these two routines.  Every time direct I/O
 to hugetlbfs pages starts, kernel calls get_page() to grab a reference to
 the pages and calls put_page() when I/O completes to put the reference
 away.  THP introduced significant amount of locking overhead to get_page()
 and put_page() when dealing with compound pages because hugepages can be
 split underneath get_page() and put_page().  It added this overhead
 irrespective of whether it is dealing with hugetlbfs pages or transparent
 hugepages.  This resulted in 20%-45% drop in aio performance when using
 hugetlbfs pages.
 
 Since hugetlbfs pages can not be split, there is no reason to go through
 all the locking overhead for these pages from what I can see.  I added
 code to __get_page_tail() and put_compound_page() to bypass all the
 locking code when working with hugetlbfs pages.  This improved performance
 significantly.  Performance numbers with this patch:
 
   pre-THP 3.11-rc53.11-rc5 + Patch
 1M read   8384 MB/s   6501 MB/s   8371 MB/s
 64K read  7867 MB/s   4251 MB/s   6510 MB/s
 
 Performance with 64K read is still lower than what it was before THP, but
 still a 53% improvement.  It does mean there is more work to be done but I
 will take a 53% improvement for now.
 
 Please take a look at the following patch and let me know if it looks
 reasonable.
 
 [a...@linux-foundation.org: tweak comments]
 Signed-off-by: Khalid Aziz khalid.a...@oracle.com
 Cc: Pravin B Shelar pshe...@nicira.com
 Cc: Christoph Lameter c...@linux.com
 Cc: Andrea Arcangeli aarca...@redhat.com
 Cc: Johannes Weiner han...@cmpxchg.org
 Cc: Mel Gorman m...@csn.ul.ie
 Cc: Rik van Riel r...@redhat.com
 Cc: Minchan Kim minc...@kernel.org
 Cc: Andi Kleen a...@firstfloor.org
 Signed-off-by: Andrew Morton a...@linux-foundation.org
 Signed-off-by: Linus Torvalds torva...@linux-foundation.org
 [ luis: backported to 3.5: adjusted context ]
 Signed-off-by: Luis Henriques luis.henriq...@canonical.com
Hi Greg,

I suppose this patch also needed for 3.4, right?

Regards,
Jack


 ---
  mm/swap.c | 77 
 ++-
  1 file changed, 52 insertions(+), 25 deletions(-)
 
 diff --git a/mm/swap.c b/mm/swap.c
 index 4e7e2ec..0c833e8 100644
 --- a/mm/swap.c
 +++ b/mm/swap.c
 @@ -30,6 +30,7 @@
  #include linux/backing-dev.h
  #include linux/memcontrol.h
  #include linux/gfp.h
 +#include linux/hugetlb.h
  
  #include internal.h
  
 @@ -77,6 +78,19 @@ static void __put_compound_page(struct page *page)
  
  static void put_compound_page(struct page *page)
  {
 + /*
 +  * hugetlbfs pages cannot be split from under us.  If this is a
 +  * hugetlbfs page, check refcount on head page and release the page if
 +  * the refcount becomes zero.
 +  */
 + if (PageHuge(page)) {
 + page = compound_head(page);
 + if (put_page_testzero(page))
 + __put_compound_page(page);
 +
 + return;
 + }
 +
   if (unlikely(PageTail(page))) {
   /* __split_huge_page_refcount can run under us */
   struct page *page_head = compound_trans_head(page);
 @@ -180,38 +194,51 @@ bool __get_page_tail(struct page *page)
* proper PT lock that already serializes against
* split_huge_page().
*/
 - unsigned long flags;
   bool got = false;
 - struct page *page_head = compound_trans_head(page);
 + struct page *page_head;
  
 - if (likely(page != page_head  get_page_unless_zero(page_head))) {
 + /*
 +  * If this is a hugetlbfs page it cannot be split under us.  Simply
 +  * increment refcount for the head page.
 +  */
 + if (PageHuge(page)) {
 + page_head = compound_head(page);
 

Re: Drivers: scsi: FLUSH timeout

2013-09-24 Thread Jack Wang
On 09/21/2013 07:24 AM, KY Srinivasan wrote:
> 
> 
>> -Original Message-
>> From: Greg KH [mailto:gre...@linuxfoundation.org]
>> Sent: Friday, September 20, 2013 1:32 PM
>> To: KY Srinivasan
>> Cc: linux-kernel@vger.kernel.org; de...@linuxdriverproject.org;
>> oher...@suse.com; jbottom...@parallels.com; h...@infradead.org; linux-
>> s...@vger.kernel.org
>> Subject: Re: Drivers: scsi: FLUSH timeout
>>
>> On Fri, Sep 20, 2013 at 12:32:27PM -0700, K. Y. Srinivasan wrote:
>>> The SD_FLUSH_TIMEOUT value is currently hardcoded.
>>
>> Hardcoded where?  Please, more context.
> 
> This is defined in scsi/sd.h:
> 
> #define SD_FLUSH_TIMEOUT(60 * HZ)
>>
>>> On our cloud, we sometimes hit this timeout. I was wondering if we
>>> could make this a module parameter. If this is acceptable, I can send
>>> you a patch for this.
>>
>> A module parameter don't make sense for a per-device value, does it?
> Currently, the 60 second timeout is applied across devices. Ideally, I want 
> to be
> able to control the FLUSH TIMEOUT as we currently do I/O timeout. If this is 
> acceptable, I can work on a patch for that as well.
> 
> Regards,
> 
> K. Y
>>
>> greg k-h
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Hi,

Back to 2010, Mike(cc-ed) try to add a flush time out interface, similar
to what you want here, no idea why it's just ignored?
http://www.spinics.net/lists/linux-scsi/msg45017.html

Jack
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Drivers: scsi: FLUSH timeout

2013-09-24 Thread Jack Wang
On 09/21/2013 07:24 AM, KY Srinivasan wrote:
 
 
 -Original Message-
 From: Greg KH [mailto:gre...@linuxfoundation.org]
 Sent: Friday, September 20, 2013 1:32 PM
 To: KY Srinivasan
 Cc: linux-kernel@vger.kernel.org; de...@linuxdriverproject.org;
 oher...@suse.com; jbottom...@parallels.com; h...@infradead.org; linux-
 s...@vger.kernel.org
 Subject: Re: Drivers: scsi: FLUSH timeout

 On Fri, Sep 20, 2013 at 12:32:27PM -0700, K. Y. Srinivasan wrote:
 The SD_FLUSH_TIMEOUT value is currently hardcoded.

 Hardcoded where?  Please, more context.
 
 This is defined in scsi/sd.h:
 
 #define SD_FLUSH_TIMEOUT(60 * HZ)

 On our cloud, we sometimes hit this timeout. I was wondering if we
 could make this a module parameter. If this is acceptable, I can send
 you a patch for this.

 A module parameter don't make sense for a per-device value, does it?
 Currently, the 60 second timeout is applied across devices. Ideally, I want 
 to be
 able to control the FLUSH TIMEOUT as we currently do I/O timeout. If this is 
 acceptable, I can work on a patch for that as well.
 
 Regards,
 
 K. Y

 greg k-h
 --
 To unsubscribe from this list: send the line unsubscribe linux-scsi in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
Hi,

Back to 2010, Mike(cc-ed) try to add a flush time out interface, similar
to what you want here, no idea why it's just ignored?
http://www.spinics.net/lists/linux-scsi/msg45017.html

Jack
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 8/9] scsi/pm8001: use pdev->pm_cap instead of pci_find_capability(..,PCI_CAP_ID_PM)

2013-06-26 Thread Jack Wang
On 06/18/2013 10:23 AM, Yijing Wang wrote:
> Pci core has been saved pm cap register offset by pdev->pm_cap in 
> pci_pm_init()
> in init path. So we can use pdev->pm_cap instead of using
> pci_find_capability(pdev, PCI_CAP_ID_PM) for better performance and 
> simplified code.
> 
I think Lindar already replied to you, and tested it on hardware.

So you can add her:
Acked-by or Tested-by


Regards,
Jack
> Signed-off-by: Yijing Wang 
> Cc: xjtu...@gmail.com
> Cc: lindar_...@usish.com
> Cc: "James E.J. Bottomley" 
> Cc: linux-s...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  drivers/scsi/pm8001/pm8001_init.c |7 +++
>  1 files changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/pm8001/pm8001_init.c 
> b/drivers/scsi/pm8001/pm8001_init.c
> index e4b9bc7..3861aa1 100644
> --- a/drivers/scsi/pm8001/pm8001_init.c
> +++ b/drivers/scsi/pm8001/pm8001_init.c
> @@ -912,14 +912,13 @@ static int pm8001_pci_suspend(struct pci_dev *pdev, 
> pm_message_t state)
>  {
>   struct sas_ha_struct *sha = pci_get_drvdata(pdev);
>   struct pm8001_hba_info *pm8001_ha;
> - int i , pos;
> + int i;
>   u32 device_state;
>   pm8001_ha = sha->lldd_ha;
>   flush_workqueue(pm8001_wq);
>   scsi_block_requests(pm8001_ha->shost);
> - pos = pci_find_capability(pdev, PCI_CAP_ID_PM);
> - if (pos == 0) {
> - printk(KERN_ERR " PCI PM not supported\n");
> + if (!pdev->pm_cap) {
> + dev_err(>dev, " PCI PM not supported\n");
>   return -ENODEV;
>   }
>   PM8001_CHIP_DISP->interrupt_disable(pm8001_ha, 0xFF);
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 8/9] scsi/pm8001: use pdev-pm_cap instead of pci_find_capability(..,PCI_CAP_ID_PM)

2013-06-26 Thread Jack Wang
On 06/18/2013 10:23 AM, Yijing Wang wrote:
 Pci core has been saved pm cap register offset by pdev-pm_cap in 
 pci_pm_init()
 in init path. So we can use pdev-pm_cap instead of using
 pci_find_capability(pdev, PCI_CAP_ID_PM) for better performance and 
 simplified code.
 
I think Lindar already replied to you, and tested it on hardware.

So you can add her:
Acked-by or Tested-by


Regards,
Jack
 Signed-off-by: Yijing Wang wangyij...@huawei.com
 Cc: xjtu...@gmail.com
 Cc: lindar_...@usish.com
 Cc: James E.J. Bottomley jbottom...@parallels.com
 Cc: linux-s...@vger.kernel.org
 Cc: linux-kernel@vger.kernel.org
 ---
  drivers/scsi/pm8001/pm8001_init.c |7 +++
  1 files changed, 3 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/scsi/pm8001/pm8001_init.c 
 b/drivers/scsi/pm8001/pm8001_init.c
 index e4b9bc7..3861aa1 100644
 --- a/drivers/scsi/pm8001/pm8001_init.c
 +++ b/drivers/scsi/pm8001/pm8001_init.c
 @@ -912,14 +912,13 @@ static int pm8001_pci_suspend(struct pci_dev *pdev, 
 pm_message_t state)
  {
   struct sas_ha_struct *sha = pci_get_drvdata(pdev);
   struct pm8001_hba_info *pm8001_ha;
 - int i , pos;
 + int i;
   u32 device_state;
   pm8001_ha = sha-lldd_ha;
   flush_workqueue(pm8001_wq);
   scsi_block_requests(pm8001_ha-shost);
 - pos = pci_find_capability(pdev, PCI_CAP_ID_PM);
 - if (pos == 0) {
 - printk(KERN_ERR  PCI PM not supported\n);
 + if (!pdev-pm_cap) {
 + dev_err(pdev-dev,  PCI PM not supported\n);
   return -ENODEV;
   }
   PM8001_CHIP_DISP-interrupt_disable(pm8001_ha, 0xFF);
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


kernel tried to execute NX-protected page - exploit attempt? (uid: 998)

2013-05-27 Thread Jack Wang
Hi all,

We saw below bug in our production.

Kernel is linux 3.4.23, as I know it means control was transferred to a
data page. This could happen because of a stack overflow (overwrite
return address with bogus pointer into data pages), or by calling a
function pointer which isn't pointing where it's supposed to be pointing?
>From the back trace it seems code BUG at VFS layer, I checked commit
history in file fs/namei.c, not found any clue, I also checked commit
history from 3.4.23 to 3.4.47, haven't find possible fix.

Anyone can give some suggestion or clue about this bug?


May 26 02:17:27 pserver107 pbmonitor: List sent (264 entries out of 616
total, 616 allocated)
May 26 02:18:02 pserver107 slog[3485]: vcb: VM (UUID
724a9458-ae76-b9c7-3434-ea9800effcff) not running.
May 26 02:18:03 pserver107 slog[3485]: vcb: VM (UUID
b62739d1-738f-d02d-b35d-ffadcf9251a8) not running.
May 26 02:18:04 pserver107 slog[3485]: vcb: VM (UUID
5b378a75-5512-4ea1-99ba-933c2d2c1716) not running.
May 26 02:19:04 pserver107 [736175.109085] kernel tried to execute
NX-protected page - exploit attempt? (uid: 998)
May 26 02:19:04 pserver107 [736175.109310] BUG: unable to handle kernel
May 26 02:19:04 pserver107  at 8807f9287e08
May 26 02:19:04 pserver107 [736175.109429] IP:
May 26 02:19:04 pserver107  [] 0x8807f9287e07
May 26 02:19:04 pserver107 [736175.109545] PGD 1a0c063
May 26 02:19:04 pserver107
May 26 02:19:04 pserver107 [736175.109664] Oops: 0011 [#1]
May 26 02:19:04 pserver107
May 26 02:19:04 pserver107 [736175.109782] CPU 50
May 26 02:19:04 pserver107
May 26 02:19:04 pserver107 [736175.109796] Modules linked in:
May 26 02:19:04 pserver107  fuse
May 26 02:19:04 pserver107  bridge
May 26 02:19:04 pserver107  stp
May 26 02:19:04 pserver107  llc
May 26 02:19:04 pserver107  nf_conntrack_ipv6
May 26 02:19:04 pserver107  nf_defrag_ipv6
May 26 02:19:04 pserver107  ip6table_filter
May 26 02:19:04 pserver107  ip6_tables
May 26 02:19:04 pserver107  dm_round_robin
May 26 02:19:04 pserver107  sd_mod
May 26 02:19:04 pserver107  crc_t10dif
May 26 02:19:04 pserver107  ib_srp
May 26 02:19:04 pserver107  scsi_transport_srp
May 26 02:19:04 pserver107  scsi_tgt
May 26 02:19:04 pserver107  xt_ETHOIP6(O)
May 26 02:19:04 pserver107  x_tables
May 26 02:19:04 pserver107  vhost_net(O)
May 26 02:19:04 pserver107  macvtap
May 26 02:19:04 pserver107  macvlan
May 26 02:19:04 pserver107  tun(O)
May 26 02:19:04 pserver107  nf_conntrack_ipv4
May 26 02:19:04 pserver107  nf_conntrack
May 26 02:19:04 pserver107  nf_defrag_ipv4
May 26 02:19:04 pserver107  rdma_ucm
May 26 02:19:04 pserver107  rdma_cm
May 26 02:19:04 pserver107  iw_cm
May 26 02:19:04 pserver107  ib_addr
May 26 02:19:04 pserver107  ib_ipoib
May 26 02:19:04 pserver107  ib_cm
May 26 02:19:04 pserver107  ib_sa
May 26 02:19:04 pserver107  ib_uverbs
May 26 02:19:04 pserver107  ib_umad
May 26 02:19:04 pserver107  ib_qib
May 26 02:19:04 pserver107  mlx4_ib
May 26 02:19:04 pserver107  ib_mthca
May 26 02:19:04 pserver107  ib_mad
May 26 02:19:04 pserver107  ib_core
May 26 02:19:04 pserver107  dm_multipath
May 26 02:19:04 pserver107  scsi_dh
May 26 02:19:04 pserver107  kvm_amd
May 26 02:19:04 pserver107  kvm
May 26 02:19:04 pserver107  sg
May 26 02:19:04 pserver107  powernow_k8
May 26 02:19:04 pserver107  psmouse
May 26 02:19:04 pserver107  mperf
May 26 02:19:04 pserver107  crc32c_intel
May 26 02:19:04 pserver107  microcode
May 26 02:19:04 pserver107  tpm_tis
May 26 02:19:04 pserver107  tpm
May 26 02:19:04 pserver107  tpm_bios
May 26 02:19:04 pserver107  serio_raw
May 26 02:19:04 pserver107  evdev
May 26 02:19:04 pserver107  usb_storage
May 26 02:19:04 pserver107  scsi_mod
May 26 02:19:04 pserver107  amd64_edac_mod
May 26 02:19:04 pserver107  edac_core
May 26 02:19:04 pserver107  edac_mce_amd
May 26 02:19:04 pserver107  i2c_piix4
May 26 02:19:04 pserver107  button
May 26 02:19:04 pserver107  processor
May 26 02:19:04 pserver107  thermal_sys
May 26 02:19:04 pserver107  mlx4_core
May 26 02:19:04 pserver107
May 26 02:19:04 pserver107 [736175.04]
May 26 02:19:04 pserver107 [736175.111202] Pid: 3485, comm: vcb Tainted:
G   O 3.4.23-pserver #1
May 26 02:19:04 pserver107  Supermicro H8QG6
May 26 02:19:04 pserver107 /H8QG6
May 26 02:19:04 pserver107
May 26 02:19:04 pserver107 [736175.111423] RIP: 0010:[]
May 26 02:19:04 pserver107  [] 0x8807f9287e07
May 26 02:19:04 pserver107 [736175.111626] RSP: 0018:8807f9287cf0
EFLAGS: 00010286
May 26 02:19:04 pserver107 [736175.111737] RAX: 81345cb0 RBX:
88080740e910 RCX: 0038
May 26 02:19:04 pserver107 [736175.111938] RDX: 0125 RSI:
882ffeef6630 RDI: 882ffeef6630
May 26 02:19:04 pserver107 [736175.112147] RBP: 811923c9 R08:
0007 R09: 880803b07d78
May 26 02:19:04 pserver107 [736175.112364] R10: 30303532 R11:
8807f9287d90 R12: 880803b07d40
May 26 02:19:04 pserver107 [736175.112563] R13: 8830044c3ec0 R14:
881804288020 R15: 880803b07d40
May 26 02:19:04 

kernel tried to execute NX-protected page - exploit attempt? (uid: 998)

2013-05-27 Thread Jack Wang
Hi all,

We saw below bug in our production.

Kernel is linux 3.4.23, as I know it means control was transferred to a
data page. This could happen because of a stack overflow (overwrite
return address with bogus pointer into data pages), or by calling a
function pointer which isn't pointing where it's supposed to be pointing?
From the back trace it seems code BUG at VFS layer, I checked commit
history in file fs/namei.c, not found any clue, I also checked commit
history from 3.4.23 to 3.4.47, haven't find possible fix.

Anyone can give some suggestion or clue about this bug?


May 26 02:17:27 pserver107 pbmonitor: List sent (264 entries out of 616
total, 616 allocated)
May 26 02:18:02 pserver107 slog[3485]: vcb: VM (UUID
724a9458-ae76-b9c7-3434-ea9800effcff) not running.
May 26 02:18:03 pserver107 slog[3485]: vcb: VM (UUID
b62739d1-738f-d02d-b35d-ffadcf9251a8) not running.
May 26 02:18:04 pserver107 slog[3485]: vcb: VM (UUID
5b378a75-5512-4ea1-99ba-933c2d2c1716) not running.
May 26 02:19:04 pserver107 [736175.109085] kernel tried to execute
NX-protected page - exploit attempt? (uid: 998)
May 26 02:19:04 pserver107 [736175.109310] BUG: unable to handle kernel
May 26 02:19:04 pserver107  at 8807f9287e08
May 26 02:19:04 pserver107 [736175.109429] IP:
May 26 02:19:04 pserver107  [8807f9287e08] 0x8807f9287e07
May 26 02:19:04 pserver107 [736175.109545] PGD 1a0c063
May 26 02:19:04 pserver107
May 26 02:19:04 pserver107 [736175.109664] Oops: 0011 [#1]
May 26 02:19:04 pserver107
May 26 02:19:04 pserver107 [736175.109782] CPU 50
May 26 02:19:04 pserver107
May 26 02:19:04 pserver107 [736175.109796] Modules linked in:
May 26 02:19:04 pserver107  fuse
May 26 02:19:04 pserver107  bridge
May 26 02:19:04 pserver107  stp
May 26 02:19:04 pserver107  llc
May 26 02:19:04 pserver107  nf_conntrack_ipv6
May 26 02:19:04 pserver107  nf_defrag_ipv6
May 26 02:19:04 pserver107  ip6table_filter
May 26 02:19:04 pserver107  ip6_tables
May 26 02:19:04 pserver107  dm_round_robin
May 26 02:19:04 pserver107  sd_mod
May 26 02:19:04 pserver107  crc_t10dif
May 26 02:19:04 pserver107  ib_srp
May 26 02:19:04 pserver107  scsi_transport_srp
May 26 02:19:04 pserver107  scsi_tgt
May 26 02:19:04 pserver107  xt_ETHOIP6(O)
May 26 02:19:04 pserver107  x_tables
May 26 02:19:04 pserver107  vhost_net(O)
May 26 02:19:04 pserver107  macvtap
May 26 02:19:04 pserver107  macvlan
May 26 02:19:04 pserver107  tun(O)
May 26 02:19:04 pserver107  nf_conntrack_ipv4
May 26 02:19:04 pserver107  nf_conntrack
May 26 02:19:04 pserver107  nf_defrag_ipv4
May 26 02:19:04 pserver107  rdma_ucm
May 26 02:19:04 pserver107  rdma_cm
May 26 02:19:04 pserver107  iw_cm
May 26 02:19:04 pserver107  ib_addr
May 26 02:19:04 pserver107  ib_ipoib
May 26 02:19:04 pserver107  ib_cm
May 26 02:19:04 pserver107  ib_sa
May 26 02:19:04 pserver107  ib_uverbs
May 26 02:19:04 pserver107  ib_umad
May 26 02:19:04 pserver107  ib_qib
May 26 02:19:04 pserver107  mlx4_ib
May 26 02:19:04 pserver107  ib_mthca
May 26 02:19:04 pserver107  ib_mad
May 26 02:19:04 pserver107  ib_core
May 26 02:19:04 pserver107  dm_multipath
May 26 02:19:04 pserver107  scsi_dh
May 26 02:19:04 pserver107  kvm_amd
May 26 02:19:04 pserver107  kvm
May 26 02:19:04 pserver107  sg
May 26 02:19:04 pserver107  powernow_k8
May 26 02:19:04 pserver107  psmouse
May 26 02:19:04 pserver107  mperf
May 26 02:19:04 pserver107  crc32c_intel
May 26 02:19:04 pserver107  microcode
May 26 02:19:04 pserver107  tpm_tis
May 26 02:19:04 pserver107  tpm
May 26 02:19:04 pserver107  tpm_bios
May 26 02:19:04 pserver107  serio_raw
May 26 02:19:04 pserver107  evdev
May 26 02:19:04 pserver107  usb_storage
May 26 02:19:04 pserver107  scsi_mod
May 26 02:19:04 pserver107  amd64_edac_mod
May 26 02:19:04 pserver107  edac_core
May 26 02:19:04 pserver107  edac_mce_amd
May 26 02:19:04 pserver107  i2c_piix4
May 26 02:19:04 pserver107  button
May 26 02:19:04 pserver107  processor
May 26 02:19:04 pserver107  thermal_sys
May 26 02:19:04 pserver107  mlx4_core
May 26 02:19:04 pserver107
May 26 02:19:04 pserver107 [736175.04]
May 26 02:19:04 pserver107 [736175.111202] Pid: 3485, comm: vcb Tainted:
G   O 3.4.23-pserver #1
May 26 02:19:04 pserver107  Supermicro H8QG6
May 26 02:19:04 pserver107 /H8QG6
May 26 02:19:04 pserver107
May 26 02:19:04 pserver107 [736175.111423] RIP: 0010:[8807f9287e08]
May 26 02:19:04 pserver107  [8807f9287e08] 0x8807f9287e07
May 26 02:19:04 pserver107 [736175.111626] RSP: 0018:8807f9287cf0
EFLAGS: 00010286
May 26 02:19:04 pserver107 [736175.111737] RAX: 81345cb0 RBX:
88080740e910 RCX: 0038
May 26 02:19:04 pserver107 [736175.111938] RDX: 0125 RSI:
882ffeef6630 RDI: 882ffeef6630
May 26 02:19:04 pserver107 [736175.112147] RBP: 811923c9 R08:
0007 R09: 880803b07d78
May 26 02:19:04 pserver107 [736175.112364] R10: 30303532 R11:
8807f9287d90 R12: 880803b07d40
May 26 02:19:04 pserver107 [736175.112563] R13: 8830044c3ec0 R14:

RE: How to online remove an error scsi disk from the system?

2013-02-01 Thread Jack Wang

On 02/01/2013 04:50 PM, Jack Wang wrote:
> Hi All,
>   In our product system, we have several sata disks attached to one 
> machine. So when one of the disk fails, the jbd2(yes, we use ext4) 
> will hang forever and we will get something in /var/log/messages like
below.
> It seems to me that the io sent to the scsi layer is never returned 
> back with -EIO which is a little bit surprised for me(It should be a 
> timeout somewhere, right?). We have tried echo "offline" > 
> /sys/block/sdl/device/state, but it doesn't work. So is there any way 
> for us to let the scsi device returns all the io requests back with 
> EIO so that all the end_io can be called accordingly? Am I missing
something here?
> 
> Thanks,
> Tao
> [Jack Wang]
> Hi Tao,
> 
> Have you tried:
>  echo 1 > /sys/block/sdv/device/delete
It will do some IO first so it will hang doing IO.
>  echo "- - -" > /sys/class/scsi_host/host
What do you mean for this line?

[Jack Wang] Sorry I mean to let the driver rescan to get the disk back.
The line should be :
 echo "- - -" > /sys/class/scsi_host/hostx/scan.

Per above delete does not work , so no need to run this.
> 
> another way is :
> find out which phy the disk attached to and:
> echo 1 > /sys/class/sas_phy/phy-x:x:x/link_reset
sorry, I have done it, but there is no response.

[Jack Wang] 
What about
echo 1 > /sys/class/sas_phy/phy-x:x:x/hard_reset

?
Thanks,
Tao
> 
> Jack
> 
> --
> To unsubscribe from this list: send the line "unsubscribe 
> linux-kernel" in the body of a message to majord...@vger.kernel.org 
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the
body of a message to majord...@vger.kernel.org More majordomo info at
http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: How to online remove an error scsi disk from the system?

2013-02-01 Thread Jack Wang
Hi All,
In our product system, we have several sata disks attached to one
machine. So when one of the disk fails, the jbd2(yes, we use ext4) will hang
forever and we will get something in /var/log/messages like below.
It seems to me that the io sent to the scsi layer is never returned back
with -EIO which is a little bit surprised for me(It should be a timeout
somewhere, right?). We have tried echo "offline" >
/sys/block/sdl/device/state, but it doesn't work. So is there any way for us
to let the scsi device returns all the io requests back with EIO so that all
the end_io can be called accordingly? Am I missing something here?

Thanks,
Tao
[Jack Wang] 
Hi Tao,

Have you tried:
 echo 1 > /sys/block/sdv/device/delete
 echo "- - -" > /sys/class/scsi_host/host

another way is :
find out which phy the disk attached to and:
echo 1 > /sys/class/sas_phy/phy-x:x:x/link_reset

Jack

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: How to online remove an error scsi disk from the system?

2013-02-01 Thread Jack Wang
Hi All,
In our product system, we have several sata disks attached to one
machine. So when one of the disk fails, the jbd2(yes, we use ext4) will hang
forever and we will get something in /var/log/messages like below.
It seems to me that the io sent to the scsi layer is never returned back
with -EIO which is a little bit surprised for me(It should be a timeout
somewhere, right?). We have tried echo offline 
/sys/block/sdl/device/state, but it doesn't work. So is there any way for us
to let the scsi device returns all the io requests back with EIO so that all
the end_io can be called accordingly? Am I missing something here?

Thanks,
Tao
[Jack Wang] 
Hi Tao,

Have you tried:
 echo 1  /sys/block/sdv/device/delete
 echo - - -  /sys/class/scsi_host/host

another way is :
find out which phy the disk attached to and:
echo 1  /sys/class/sas_phy/phy-x:x:x/link_reset

Jack

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: How to online remove an error scsi disk from the system?

2013-02-01 Thread Jack Wang

On 02/01/2013 04:50 PM, Jack Wang wrote:
 Hi All,
   In our product system, we have several sata disks attached to one 
 machine. So when one of the disk fails, the jbd2(yes, we use ext4) 
 will hang forever and we will get something in /var/log/messages like
below.
 It seems to me that the io sent to the scsi layer is never returned 
 back with -EIO which is a little bit surprised for me(It should be a 
 timeout somewhere, right?). We have tried echo offline  
 /sys/block/sdl/device/state, but it doesn't work. So is there any way 
 for us to let the scsi device returns all the io requests back with 
 EIO so that all the end_io can be called accordingly? Am I missing
something here?
 
 Thanks,
 Tao
 [Jack Wang]
 Hi Tao,
 
 Have you tried:
  echo 1  /sys/block/sdv/device/delete
It will do some IO first so it will hang doing IO.
  echo - - -  /sys/class/scsi_host/host
What do you mean for this line?

[Jack Wang] Sorry I mean to let the driver rescan to get the disk back.
The line should be :
 echo - - -  /sys/class/scsi_host/hostx/scan.

Per above delete does not work , so no need to run this.
 
 another way is :
 find out which phy the disk attached to and:
 echo 1  /sys/class/sas_phy/phy-x:x:x/link_reset
sorry, I have done it, but there is no response.

[Jack Wang] 
What about
echo 1  /sys/class/sas_phy/phy-x:x:x/hard_reset

?
Thanks,
Tao
 
 Jack
 
 --
 To unsubscribe from this list: send the line unsubscribe 
 linux-kernel in the body of a message to majord...@vger.kernel.org 
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in the
body of a message to majord...@vger.kernel.org More majordomo info at
http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: mvsas regression since 3.5

2012-12-23 Thread Jack Wang


[Jack Wang] 
Do you see bug51881, which may relate to what you see
https://bugzilla.kernel.org/show_bug.cgi?id=51881

btw: also correct Dan's mail address.


I'm using Asus PIKE 6480 SAS card, whose chipset is "RAID bus
controller: Marvell Technology Group Ltd. MV64460/64461/64462 System 
Controller, Revision B", with latest stable branch 3.7.1 only 1 of 8 ports 
works, to get others works I got to pull & plug back. While with
3.5.7 it's all good, so it must be a regression.

Here's corresponding boot time dmesg of 3.7.1 with debug compiled:

[5.967302] udevd[1212]: starting version 171
[6.568908] mvsas :04:00.0: mvsas: driver version 0.8.16
[6.569431] mvsas :04:00.0: mvsas: PCI-E x4, Bandwidth Usage: 2.5 Gbps
[   10.980331] drivers/scsi/mvsas/mv_sas.c 1174:phy 0 attach dev info is 0
[   10.980334] drivers/scsi/mvsas/mv_sas.c 1176:phy 0 attach sas addr is 0
[   11.180909] drivers/scsi/mvsas/mv_sas.c 1174:phy 1 attach dev info is 0
[   11.180912] drivers/scsi/mvsas/mv_sas.c 1176:phy 1 attach sas addr is 1
[   11.381501] drivers/scsi/mvsas/mv_sas.c 1174:phy 2 attach dev info is 0
[   11.381503] drivers/scsi/mvsas/mv_sas.c 1176:phy 2 attach sas addr is 2
[   11.582085] drivers/scsi/mvsas/mv_sas.c 1174:phy 3 attach dev info is 0
[   11.582088] drivers/scsi/mvsas/mv_sas.c 1176:phy 3 attach sas addr is 3
[   11.782673] drivers/scsi/mvsas/mv_sas.c 1174:phy 4 attach dev info is 0
[   11.782676] drivers/scsi/mvsas/mv_sas.c 1176:phy 4 attach sas addr is 4
[   11.983262] drivers/scsi/mvsas/mv_sas.c 1174:phy 5 attach dev info is 800
[   11.983265] drivers/scsi/mvsas/mv_sas.c 1176:phy 5 attach sas addr is 5
[   12.183856] drivers/scsi/mvsas/mv_sas.c 1174:phy 6 attach dev info is 4
[   12.183858] drivers/scsi/mvsas/mv_sas.c 1176:phy 6 attach sas addr is 6
[   12.384438] drivers/scsi/mvsas/mv_sas.c 1174:phy 7 attach dev info is 0
[   12.384441] drivers/scsi/mvsas/mv_sas.c 1176:phy 7 attach sas addr is 7
[   12.384447] scsi6 : mvsas
[   12.385479] drivers/scsi/mvsas/mv_sas.c 277:phy 0 byte dmaded.
[   12.385482] drivers/scsi/mvsas/mv_sas.c 277:phy 1 byte dmaded.
[   12.385488] drivers/scsi/mvsas/mv_sas.c 277:phy 2 byte dmaded.
[   12.385491] drivers/scsi/mvsas/mv_sas.c 277:phy 3 byte dmaded.
[   12.385494] drivers/scsi/mvsas/mv_sas.c 277:phy 4 byte dmaded.
[   12.385497] drivers/scsi/mvsas/mv_sas.c 277:phy 5 byte dmaded.
[   12.385500] drivers/scsi/mvsas/mv_sas.c 277:phy 6 byte dmaded.
[   12.385503] drivers/scsi/mvsas/mv_sas.c 277:phy 7 byte dmaded.
[   12.385614] sas: phy-6:0 added to port-6:0, phy_mask:0x1 (   0)
[   12.385667] sas: phy-6:1 added to port-6:1, phy_mask:0x2 ( 100)
[   12.385716] sas: phy-6:2 added to port-6:2, phy_mask:0x4 ( 200)
[   12.385759] sas: phy-6:3 added to port-6:3, phy_mask:0x8 ( 300)
[   12.385801] sas: phy-6:4 added to port-6:4, phy_mask:0x10 ( 400)
[   12.385851] sas: phy-6:5 added to port-6:5, phy_mask:0x20 ( 500)
[   12.385895] sas: phy-6:6 added to port-6:6, phy_mask:0x40 ( 600)
[   12.385942] sas: phy-6:7 added to port-6:7, phy_mask:0x80 ( 700)
[   12.385967] sas: DOING DISCOVERY on port 0, pid:847
[   12.385974] sas: DONE DISCOVERY on port 0, pid:847, result:0
[   12.385986] sas: DOING DISCOVERY on port 1, pid:847
[   12.385991] sas: DONE DISCOVERY on port 1, pid:847, result:0
[   12.386003] sas: DOING DISCOVERY on port 2, pid:847
[   12.386009] sas: DONE DISCOVERY on port 2, pid:847, result:0
[   12.386020] sas: DOING DISCOVERY on port 3, pid:847
[   12.386029] sas: DONE DISCOVERY on port 3, pid:847, result:0
[   12.386041] sas: DOING DISCOVERY on port 4, pid:847
[   12.386048] sas: DONE DISCOVERY on port 4, pid:847, result:0
[   12.386060] sas: DOING DISCOVERY on port 5, pid:847
[   12.386066] sas: DONE DISCOVERY on port 5, pid:847, result:0
[   12.386077] sas: DOING DISCOVERY on port 6, pid:847
[   12.386083] sas: DONE DISCOVERY on port 6, pid:847, result:0
[   12.386095] sas: DOING DISCOVERY on port 7, pid:847
[   12.386101] sas: DONE DISCOVERY on port 7, pid:847, result:0
[   12.386188] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
[   12.386281] sas: ata7: end_device-6:0: dev error handler
[   12.537987] ata7.00: ATA-8: ST33000650NS, 0004, max UDMA/133
[   12.537992] ata7.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[   12.538227] drivers/scsi/mvsas/mv_sas.c 1857:port 0 slot 0 rx_desc
2 has error info8100.
[   12.538256] ata7.00: failed to get Identify Device Data, Emask 0x1
[   12.539467] drivers/scsi/mvsas/mv_sas.c 1857:port 0 slot 0 rx_desc
2 has error info8100.
[   12.539496] ata7.00: failed to get Identify Device Data, Emask 0x1
[   12.539504] ata7.00: configured for UDMA/133
[   12.539611] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
[   12.550707] scsi 6:0:0:0: Direct-Access ATA  ST33000650NS
  0004 PQ: 0 ANSI: 5
[   12.550800] sas: Enter sas_scsi_recover_host bus

RE: mvsas regression since 3.5

2012-12-23 Thread Jack Wang


[Jack Wang] 
Do you see bug51881, which may relate to what you see
https://bugzilla.kernel.org/show_bug.cgi?id=51881

btw: also correct Dan's mail address.


I'm using Asus PIKE 6480 SAS card, whose chipset is RAID bus
controller: Marvell Technology Group Ltd. MV64460/64461/64462 System 
Controller, Revision B, with latest stable branch 3.7.1 only 1 of 8 ports 
works, to get others works I got to pull  plug back. While with
3.5.7 it's all good, so it must be a regression.

Here's corresponding boot time dmesg of 3.7.1 with debug compiled:

[5.967302] udevd[1212]: starting version 171
[6.568908] mvsas :04:00.0: mvsas: driver version 0.8.16
[6.569431] mvsas :04:00.0: mvsas: PCI-E x4, Bandwidth Usage: 2.5 Gbps
[   10.980331] drivers/scsi/mvsas/mv_sas.c 1174:phy 0 attach dev info is 0
[   10.980334] drivers/scsi/mvsas/mv_sas.c 1176:phy 0 attach sas addr is 0
[   11.180909] drivers/scsi/mvsas/mv_sas.c 1174:phy 1 attach dev info is 0
[   11.180912] drivers/scsi/mvsas/mv_sas.c 1176:phy 1 attach sas addr is 1
[   11.381501] drivers/scsi/mvsas/mv_sas.c 1174:phy 2 attach dev info is 0
[   11.381503] drivers/scsi/mvsas/mv_sas.c 1176:phy 2 attach sas addr is 2
[   11.582085] drivers/scsi/mvsas/mv_sas.c 1174:phy 3 attach dev info is 0
[   11.582088] drivers/scsi/mvsas/mv_sas.c 1176:phy 3 attach sas addr is 3
[   11.782673] drivers/scsi/mvsas/mv_sas.c 1174:phy 4 attach dev info is 0
[   11.782676] drivers/scsi/mvsas/mv_sas.c 1176:phy 4 attach sas addr is 4
[   11.983262] drivers/scsi/mvsas/mv_sas.c 1174:phy 5 attach dev info is 800
[   11.983265] drivers/scsi/mvsas/mv_sas.c 1176:phy 5 attach sas addr is 5
[   12.183856] drivers/scsi/mvsas/mv_sas.c 1174:phy 6 attach dev info is 4
[   12.183858] drivers/scsi/mvsas/mv_sas.c 1176:phy 6 attach sas addr is 6
[   12.384438] drivers/scsi/mvsas/mv_sas.c 1174:phy 7 attach dev info is 0
[   12.384441] drivers/scsi/mvsas/mv_sas.c 1176:phy 7 attach sas addr is 7
[   12.384447] scsi6 : mvsas
[   12.385479] drivers/scsi/mvsas/mv_sas.c 277:phy 0 byte dmaded.
[   12.385482] drivers/scsi/mvsas/mv_sas.c 277:phy 1 byte dmaded.
[   12.385488] drivers/scsi/mvsas/mv_sas.c 277:phy 2 byte dmaded.
[   12.385491] drivers/scsi/mvsas/mv_sas.c 277:phy 3 byte dmaded.
[   12.385494] drivers/scsi/mvsas/mv_sas.c 277:phy 4 byte dmaded.
[   12.385497] drivers/scsi/mvsas/mv_sas.c 277:phy 5 byte dmaded.
[   12.385500] drivers/scsi/mvsas/mv_sas.c 277:phy 6 byte dmaded.
[   12.385503] drivers/scsi/mvsas/mv_sas.c 277:phy 7 byte dmaded.
[   12.385614] sas: phy-6:0 added to port-6:0, phy_mask:0x1 (   0)
[   12.385667] sas: phy-6:1 added to port-6:1, phy_mask:0x2 ( 100)
[   12.385716] sas: phy-6:2 added to port-6:2, phy_mask:0x4 ( 200)
[   12.385759] sas: phy-6:3 added to port-6:3, phy_mask:0x8 ( 300)
[   12.385801] sas: phy-6:4 added to port-6:4, phy_mask:0x10 ( 400)
[   12.385851] sas: phy-6:5 added to port-6:5, phy_mask:0x20 ( 500)
[   12.385895] sas: phy-6:6 added to port-6:6, phy_mask:0x40 ( 600)
[   12.385942] sas: phy-6:7 added to port-6:7, phy_mask:0x80 ( 700)
[   12.385967] sas: DOING DISCOVERY on port 0, pid:847
[   12.385974] sas: DONE DISCOVERY on port 0, pid:847, result:0
[   12.385986] sas: DOING DISCOVERY on port 1, pid:847
[   12.385991] sas: DONE DISCOVERY on port 1, pid:847, result:0
[   12.386003] sas: DOING DISCOVERY on port 2, pid:847
[   12.386009] sas: DONE DISCOVERY on port 2, pid:847, result:0
[   12.386020] sas: DOING DISCOVERY on port 3, pid:847
[   12.386029] sas: DONE DISCOVERY on port 3, pid:847, result:0
[   12.386041] sas: DOING DISCOVERY on port 4, pid:847
[   12.386048] sas: DONE DISCOVERY on port 4, pid:847, result:0
[   12.386060] sas: DOING DISCOVERY on port 5, pid:847
[   12.386066] sas: DONE DISCOVERY on port 5, pid:847, result:0
[   12.386077] sas: DOING DISCOVERY on port 6, pid:847
[   12.386083] sas: DONE DISCOVERY on port 6, pid:847, result:0
[   12.386095] sas: DOING DISCOVERY on port 7, pid:847
[   12.386101] sas: DONE DISCOVERY on port 7, pid:847, result:0
[   12.386188] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
[   12.386281] sas: ata7: end_device-6:0: dev error handler
[   12.537987] ata7.00: ATA-8: ST33000650NS, 0004, max UDMA/133
[   12.537992] ata7.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[   12.538227] drivers/scsi/mvsas/mv_sas.c 1857:port 0 slot 0 rx_desc
2 has error info8100.
[   12.538256] ata7.00: failed to get Identify Device Data, Emask 0x1
[   12.539467] drivers/scsi/mvsas/mv_sas.c 1857:port 0 slot 0 rx_desc
2 has error info8100.
[   12.539496] ata7.00: failed to get Identify Device Data, Emask 0x1
[   12.539504] ata7.00: configured for UDMA/133
[   12.539611] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
[   12.550707] scsi 6:0:0:0: Direct-Access ATA  ST33000650NS
  0004 PQ: 0 ANSI: 5
[   12.550800] sas: Enter sas_scsi_recover_host busy: 0 failed: 0

Re: [PATCH 00/26] AIO performance improvements/cleanups, v2

2012-12-13 Thread Jack Wang
2012/12/14 Jens Axboe :
> On Mon, Dec 03 2012, Kent Overstreet wrote:
>> Last posting: http://thread.gmane.org/gmane.linux.kernel.aio.general/3169
>>
>> Changes since the last posting should all be noted in the individual
>> patch descriptions.
>>
>>  * Zach pointed out the aio_read_evt() patch was calling functions that
>>could sleep in TASK_INTERRUPTIBLE state, that patch is rewritten.
>>  * Ben pointed out some synchronize_rcu() usage was problematic,
>>converted it to call_rcu()
>>  * The flush_dcache_page() patch is new
>>  * Changed the "use cancellation list lazily" patch so as to remove
>>ki_flags from struct kiocb.
>
> Kent, I ran a few tests, and the below patches still don't seem as fast
> as the approach I took. To keep it fair, I used your aio branch and
> applied by dio speedups too. As a sanity check, I ran with your branch
> alone as well. The quick results below - kaio is kent-aio, just your
> branch. kaio-dio is with the direct IO speedups too. jaio is my branch,
> which already has the dio changes too.
>
> Devices Branch  IOPS
> 1   kaio~915K
> 1   kaio-dio~930K
> 1   jaio   ~1220K
> 6   kaio   ~3050K
> 6   kaio-dio   ~3080K
> 6   jaio3500K
>
> The box runs out of CPU driving power, which is why it doesn't scale
> linearly, otherwise I know that jaio at least does. It's basically
> completion limited for the 6 device test at the moment.
>
> I'll run some profiling tomorrow morning and get you some better
> results. Just thought I'd share these at least.
>
> --
> Jens Axboe
>

A really good performance, woo.

I think the device tested is really fast PCIe SSD builded by fusionio
with fusionio in house block driver?

any compare number with current mainline?

Jack
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/26] AIO performance improvements/cleanups, v2

2012-12-13 Thread Jack Wang
2012/12/14 Jens Axboe jax...@fusionio.com:
 On Mon, Dec 03 2012, Kent Overstreet wrote:
 Last posting: http://thread.gmane.org/gmane.linux.kernel.aio.general/3169

 Changes since the last posting should all be noted in the individual
 patch descriptions.

  * Zach pointed out the aio_read_evt() patch was calling functions that
could sleep in TASK_INTERRUPTIBLE state, that patch is rewritten.
  * Ben pointed out some synchronize_rcu() usage was problematic,
converted it to call_rcu()
  * The flush_dcache_page() patch is new
  * Changed the use cancellation list lazily patch so as to remove
ki_flags from struct kiocb.

 Kent, I ran a few tests, and the below patches still don't seem as fast
 as the approach I took. To keep it fair, I used your aio branch and
 applied by dio speedups too. As a sanity check, I ran with your branch
 alone as well. The quick results below - kaio is kent-aio, just your
 branch. kaio-dio is with the direct IO speedups too. jaio is my branch,
 which already has the dio changes too.

 Devices Branch  IOPS
 1   kaio~915K
 1   kaio-dio~930K
 1   jaio   ~1220K
 6   kaio   ~3050K
 6   kaio-dio   ~3080K
 6   jaio3500K

 The box runs out of CPU driving power, which is why it doesn't scale
 linearly, otherwise I know that jaio at least does. It's basically
 completion limited for the 6 device test at the moment.

 I'll run some profiling tomorrow morning and get you some better
 results. Just thought I'd share these at least.

 --
 Jens Axboe


A really good performance, woo.

I think the device tested is really fast PCIe SSD builded by fusionio
with fusionio in house block driver?

any compare number with current mainline?

Jack
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/