[PATCH] cpuset: Modify the type of use_parent_ecpus from int to bool

2021-03-12 Thread Li Feng
Since the use_parent_ecpus in cpuset is only used as bool type, change
the type from int to bool.

Signed-off-by: Li Feng 
---
 kernel/cgroup/cpuset.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 5258b68153e0..ab0bf3cc7093 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -158,7 +158,7 @@ struct cpuset {
 * use_parent_ecpus - set if using parent's effective_cpus
 * child_ecpus_count - # of children with use_parent_ecpus set
 */
-   int use_parent_ecpus;
+   bool use_parent_ecpus;
int child_ecpus_count;
 };
 
-- 
2.25.1




Re: [PATCH] nvme: reject the ns when the block size is smaller than a sector

2021-01-14 Thread Li Feng
Christoph Hellwig  于2021年1月15日周五 上午1:43写道:
>
> On Wed, Jan 13, 2021 at 02:12:59PM -0800, Sagi Grimberg wrote:
> >> But this only catches a physical block size < 512 for NVMe, not any other 
> >> block device.
> >>
> >> Please fix it for the general case in blk_queue_physical_block_size().
> >
> > We actually call that later and would probably be better to check here..
>
> We had a series for that a short while ago that got lost.
Where is the series? Or could you advise on how to fix this issue is acceptable?
This issue will generate an oops, check it here:
https://lkml.org/lkml/2021/1/12/1064

Thanks,
Feng Li


[PATCH] nvme: reject the ns when the block size is smaller than a sector

2021-01-13 Thread Li Feng
The nvme spec(1.4a, figure 248) says:
"A value smaller than 9 (i.e., 512 bytes) is not supported."

Signed-off-by: Li Feng 
---
 drivers/nvme/host/core.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f320273fc672..1f02e6e49a05 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2161,6 +2161,12 @@ static int nvme_update_ns_info(struct nvme_ns *ns, 
struct nvme_id_ns *id)
 
blk_mq_freeze_queue(ns->disk->queue);
ns->lba_shift = id->lbaf[lbaf].ds;
+   if (ns->lba_shift < 9) {
+   pr_warn("%s: bad lba_shift(%d)\n", ns->disk->disk_name, 
ns->lba_shift);
+   ret = -1;
+   goto out_unfreeze;
+   }
+
nvme_set_queue_limits(ns->ctrl, ns->queue);
 
if (ns->head->ids.csi == NVME_CSI_ZNS) {
-- 
2.29.2



[PATCH] cgroup: Remove stale comments

2021-01-12 Thread Li Feng
The function "cgroup_mount" had beed removed, remove related comments
to prevent confusion.

Related commit:90129625d9203a917f(cgroup: start switching to fs_context)

Signed-off-by: Li Feng 
---
 kernel/cgroup/cgroup.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 613845769103..493547b4941c 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -2139,7 +2139,6 @@ static void cgroup_kill_sb(struct super_block *sb)
/*
 * If @root doesn't have any children, start killing it.
 * This prevents new mounts by disabling percpu_ref_tryget_live().
-* cgroup_mount() may wait for @root's release.
 *
 * And don't kill the default root.
 */
-- 
2.25.1




Re: [PATCH v2] blk: avoid divide-by-zero with zero granularity

2021-01-12 Thread Li Feng
Yes, Reject the device is the right fix. I will try to send another fix.
By the way, I think this fix is good protection, maybe some other devices
violate this block size constraint.

Divide zero is unacceptable.

Thanks,
Feng Li

Martin K. Petersen  于2021年1月13日周三 上午1:48写道:
>
>
> Johannes,
>
> >> I use the nvme-tcp as the host, the target is spdk nvme-tcp target,
> >> and set a wrong block size(i.g. bs=8), then the host prints this oops:
> >
> > I think the better fix here is to reject devices which report a block size
> > small than a sector.
>
> Yep, Linux doesn't support logical block sizes < 512 bytes.
>
> Also, the NVMe spec states:
>
> "A value smaller than 9 (i.e., 512 bytes) is not supported."
>
> --
> Martin K. Petersen  Oracle Linux Engineering


[PATCH v2] blk: avoid divide-by-zero with zero granularity

2021-01-12 Thread Li Feng
If the physical_block_size and io_min is less than a sector, the
'granularity >> SECTOR_SHIFT' will be zero.

Signed-off-by: Li Feng 
---
 include/linux/blkdev.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f94ee3089e01..b04ad113 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1485,7 +1485,11 @@ static inline int queue_alignment_offset(const struct 
request_queue *q)
 static inline int queue_limit_alignment_offset(struct queue_limits *lim, 
sector_t sector)
 {
unsigned int granularity = max(lim->physical_block_size, lim->io_min);
-   unsigned int alignment = sector_div(sector, granularity >> SECTOR_SHIFT)
+   unsigned int alignment;
+   if (granularity >> SECTOR_SHIFT == 0)
+   return 0;
+
+   alignment = sector_div(sector, granularity >> SECTOR_SHIFT)
<< SECTOR_SHIFT;
 
return (granularity + lim->alignment_offset - alignment) % granularity;
-- 
2.29.2



[PATCH] blk: avoid divide-by-zero with zero granularity

2021-01-12 Thread Li Feng
If the physical_block_size and io_min is less than a sector, the
'granularity >> SECTOR_SHIFT' will be zero.

Signed-off-by: Li Feng 
---
 include/linux/blkdev.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f94ee3089e01..4d029e95adb4 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1485,6 +1485,10 @@ static inline int queue_alignment_offset(const struct 
request_queue *q)
 static inline int queue_limit_alignment_offset(struct queue_limits *lim, 
sector_t sector)
 {
unsigned int granularity = max(lim->physical_block_size, lim->io_min);
+   granularity = granularity >> SECTOR_SHIFT;
+   if (!granularity)
+   return 0;
+
unsigned int alignment = sector_div(sector, granularity >> SECTOR_SHIFT)
<< SECTOR_SHIFT;
 
-- 
2.29.2



hwclock make clock_gettime not accurate

2017-03-15 Thread li feng
HI guys.

I'm doing some tests about clock_gettime.
And I found that clock_gettime will be affected by hwclock.
It makes clock_gettime slip advance some milliseconds.

Actually, each line prints out every 1ms.


 $ ./a.out -r CLOCK_MONOTONIC
   130 ↵
Using delay=1 ms between loop.
Using clock=CLOCK_MONOTONIC.
Clock resolution sec=0 nsec=1
Initial time sec=1621884 nsec=285113956

[delay=1ms] Slip time: 0 s 32 ms <-hwclock
[delay=1ms] Slip time: 0 s 16 ms <-hwclock

>From perf:

$ perf record -F 999 hwclock

# To display the perf.data header info, please use
--header/--header-only options.

#

# Samples: 22  of event 'cpu-clock'

# Event count (approx.): 22022022

#

# Overhead  Command  Shared Object  Symbol

#   ...  .  ...

#

   77.27%  hwclock  [kernel.kallsyms]  [k] native_read_tsc

   13.64%  hwclock  [kernel.kallsyms]  [k] delay_tsc

4.55%  hwclock  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore

4.55%  hwclock  libc-2.17.so   [.] __strftime_l




$perf record -F 999 ./a.out

# To display the perf.data header info, please use
--header/--header-only options.

#

# Samples: 7K of event 'cpu-clock'

# Event count (approx.): 79010100220

#

# Overhead  Command  Shared Object  Symbol

#   ...  .  ...

#

   28.18%  a.out[kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore

   20.64%  a.out[vdso] [.] __vdso_clock_gettime

   18.46%  a.out[kernel.kallsyms]  [k] native_read_tsc

9.81%  a.outa.out  [.] busy_loop

4.62%  a.outa.out  [.] calc_1ms

2.15%  a.outlibc-2.17.so   [.] clock_gettime

2.02%  a.outa.out  [.] overhead_clock

I thought there is a lock contention.
However, when I ran two a.out, the output is correct, not like hwclock.

Anyone knows why?

Thanks.


hwclock make clock_gettime not accurate

2017-03-15 Thread li feng
HI guys.

I'm doing some tests about clock_gettime.
And I found that clock_gettime will be affected by hwclock.
It makes clock_gettime slip advance some milliseconds.

Actually, each line prints out every 1ms.


 $ ./a.out -r CLOCK_MONOTONIC
   130 ↵
Using delay=1 ms between loop.
Using clock=CLOCK_MONOTONIC.
Clock resolution sec=0 nsec=1
Initial time sec=1621884 nsec=285113956

[delay=1ms] Slip time: 0 s 32 ms <-hwclock
[delay=1ms] Slip time: 0 s 16 ms <-hwclock

>From perf:

$ perf record -F 999 hwclock

# To display the perf.data header info, please use
--header/--header-only options.

#

# Samples: 22  of event 'cpu-clock'

# Event count (approx.): 22022022

#

# Overhead  Command  Shared Object  Symbol

#   ...  .  ...

#

   77.27%  hwclock  [kernel.kallsyms]  [k] native_read_tsc

   13.64%  hwclock  [kernel.kallsyms]  [k] delay_tsc

4.55%  hwclock  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore

4.55%  hwclock  libc-2.17.so   [.] __strftime_l




$perf record -F 999 ./a.out

# To display the perf.data header info, please use
--header/--header-only options.

#

# Samples: 7K of event 'cpu-clock'

# Event count (approx.): 79010100220

#

# Overhead  Command  Shared Object  Symbol

#   ...  .  ...

#

   28.18%  a.out[kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore

   20.64%  a.out[vdso] [.] __vdso_clock_gettime

   18.46%  a.out[kernel.kallsyms]  [k] native_read_tsc

9.81%  a.outa.out  [.] busy_loop

4.62%  a.outa.out  [.] calc_1ms

2.15%  a.outlibc-2.17.so   [.] clock_gettime

2.02%  a.outa.out  [.] overhead_clock

I thought there is a lock contention.
However, when I ran two a.out, the output is correct, not like hwclock.

Anyone knows why?

Thanks.