Re: [PATCH] checkpatch.pl: Add SPDX license tag check

2018-02-01 Thread Greg Kroah-Hartman
On Thu, Feb 01, 2018 at 03:14:29PM -0600, Rob Herring wrote:
> Add SPDX license tag check based on the rules defined in
> Documentation/process/license-rules.rst. To summarize, SPDX license tags
> should be on the 1st line (or 2nd line in scripts) using the appropriate
> comment style for the file type.
> 
> Cc: Andy Whitcroft 
> Cc: Joe Perches 
> Cc: Greg Kroah-Hartman 
> Cc: Thomas Gleixner 
> Cc: Philippe Ombredanne 
> Cc: Andrew Morton 
> Signed-off-by: Rob Herring 

Acked-by: Greg Kroah-Hartman 


Re: [perf] perf test BPF fails on 4.9.20

2018-02-01 Thread Pintu Kumar
Hi,

perf test bpf prologue generation is failing.
37.2: Test BPF prologue generation   : FAILED!

Try to find probe point from debuginfo.
Matched function: null_lseek [105be32]
Probe point found: null_lseek+0
Searching 'file' variable in context.
Converting variable file into trace event.
converting f_mode in file
file(type:file) has no member f_mode.
An error occurred in debuginfo analysis (-22).
bpf_probe: failed to convert perf probe eventsFailed to add events
selected by BPF
test child finished with -1
 end 
Test BPF filter subtest 1: FAILED!


Is there any fix available for this issue?
I searched 4.15, but could not relate any of the patches to this.


Thanks,
Pintu



On Thu, Feb 1, 2018 at 7:34 PM, Pintu Kumar  wrote:
> Hi,
>
> After enabling DEBUG_INFO in kernel I still get this error for BPF test.
> Please help.
>
> # perf test BPF -v
> .
> Looking at the vmlinux_path (8 entries long)
> Using /usr/lib/debug/boot/vmlinux-4.9.00--amd-x86-64-00071-gd94c220-dirty
> for symbols
> Open Debuginfo file:
> /usr/lib/debug/boot/vmlinux-4.9.00--amd-x86-64-00071-gd94c220-dirty
> Try to find probe point from debuginfo.
> Matched function: null_lseek [105be32]
> Probe point found: null_lseek+0
> Searching 'file' variable in context.
> Converting variable file into trace event.
> converting f_mode in file
> file(type:file) has no member f_mode.
> An error occurred in debuginfo analysis (-22).
> bpf_probe: failed to convert perf probe eventsFailed to add events
> selected by BPF
> test child finished with -1
>  end 
> Test BPF filter subtest 1: FAILED!
>
>
>
> On Thu, Feb 1, 2018 at 10:50 AM, Pintu Kumar  wrote:
>> Dear Masami,
>>
>> Now I am stuck again with 'perf test' failure on 4.9
>>
>> # perf --version
>> perf version 4.9.20-
>>
>> # perf test
>> 16: Try 'import perf' in python, checking link problems  : FAILED!
>> 37.2: Test BPF prologue generation   : FAILED!
>>
>> If you have any clue about these failure please hep me.
>>
>> Here are the verbose output:
>> -
>> 1) # perf test python -v
>> 16: Try 'import perf' in python, checking link problems  :
>> --- start ---
>> test child forked, pid 24562
>> Traceback (most recent call last):
>>   File "", line 1, in 
>> ImportError: No module named perf
>> test child finished with -1
>>  end 
>> Try 'import perf' in python, checking link problems: FAILED!
>> --
>>
>> 2) # perf test BPF -v
>> ---
>> .
>> bpf: config 'func=null_lseek file->f_mode offset orig' is ok
>> Looking at the vmlinux_path (8 entries long)
>> symsrc__init: cannot get elf header.
>> Failed to find the path for kernel: Invalid ELF file
>> bpf_probe: failed to convert perf probe eventsFailed to add events
>> selected by BPF
>> test child finished with -1
>>  end 
>> Test BPF filter subtest 1: FAILED!
>>
>> ---
>>
>>
>> Thanks,
>> Pintu
>>
>>
>> On Wed, Jan 31, 2018 at 9:01 AM, Masami Hiramatsu  
>> wrote:
>>> On Tue, 30 Jan 2018 19:20:36 +0530
>>> Pintu Kumar  wrote:
>>>
 On Tue, Jan 30, 2018 at 11:13 AM, Masami Hiramatsu  
 wrote:
 >
 > On Mon, 29 Jan 2018 22:00:52 +0530
 > Pintu Kumar  wrote:
 >
 > > Dear Masami,
 > >
 > > Thank you so much for your reply.
 > > Please find some of my answers inline.
 > >
 > >
 > > On Mon, Jan 29, 2018 at 7:47 PM, Masami Hiramatsu 
 > >  wrote:
 > > > On Mon, 29 Jan 2018 13:40:34 +0530
 > > > Pintu Kumar  wrote:
 > > >
 > > >> Hi All,
 > > >>
 > > >> 'perf probe' is failing sometimes on 4.9.20 with AMD-64.
 > > >> # perf probe --add schedule
 > > >> schedule is out of .text, skip it.
 > > >>   Error: Failed to add events.
 > > >>
 > > >> If any one have come across this problem please let me know the 
 > > >> cause.
 > > >
 > > > Hi Pintu,
 > > >
 > > > Could you run it with --vv?
 > > >
 > > Ok, I will send verbose output by tomorrow.
 > >
 > > >>
 > > >> Note: I don't have CONFIG_DEBUG_INFO enabled in kernel. Is this the 
 > > >> problem?
 > > >
 > > > Without it, you can not probe source-level probe nor trace local 
 > > > variable.
 > > >
 > >
 > > Currently I am facing problem in enabling DEBUG_INFO in our kernel 
 > > 4.9.20
 > > However, I will try to manually include "-g" option during compilation.
 > >
 > > >> However, I manually copied the vmlinux file to /boot/ directory, but
 > > >> still it does not work.
 > > >
 > > > That doesn't work.
 > > > CONFIG_DEBUG_INFO option enables gcc to compile kernel with extra 
 > > > debuginfo.
 > > > Without th

RE: [PATCH] mm/swap: add function get_total_swap_pages to expose total_swap_pages

2018-02-01 Thread He, Roger
Can you try to use a fixed limit like I suggested once more?
E.g. just stop swapping if get_nr_swap_pages() < 256MB.

Maybe you missed previous mail. I explain again here.
Set the value as 256MB not work on my platform.  My machine has 8GB system 
memory, and 8GB swap disk.
On my machine, set it as 4G can work.
But 4G also not work on test machine with 16GB system memory & 8GB swap disk.


Thanks
Roger(Hongbo.He)

-Original Message-
From: Koenig, Christian 
Sent: Friday, February 02, 2018 3:46 PM
To: He, Roger ; Zhou, David(ChunMing) ; 
dri-de...@lists.freedesktop.org
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/swap: add function get_total_swap_pages to expose 
total_swap_pages

Can you try to use a fixed limit like I suggested once more?

E.g. just stop swapping if get_nr_swap_pages() < 256MB.

Regards,
Christian.

Am 02.02.2018 um 07:57 schrieb He, Roger:
>   Use the limit as total ram*1/2 seems work very well.
>   No OOM although swap disk reaches full at peak for piglit test.
>
> But for this approach, David noticed that has an obvious defect.
> For example,  if the platform has 32G system memory, 8G swap disk.
> 1/2 * ram = 16G which is bigger than swap disk, so no swap for TTM is allowed 
> at all.
> For now we work out an improved version based on get_nr_swap_pages().
> Going to send out later.
>
> Thanks
> Roger(Hongbo.He)
> -Original Message-
> From: He, Roger
> Sent: Thursday, February 01, 2018 4:03 PM
> To: Koenig, Christian ; Zhou, 
> David(ChunMing) ; dri-de...@lists.freedesktop.org
> Cc: linux...@kvack.org; linux-kernel@vger.kernel.org; 'He, Roger' 
> 
> Subject: RE: [PATCH] mm/swap: add function get_total_swap_pages to 
> expose total_swap_pages
>
> Just now, I tried with fixed limit.  But not work always.
> For example: set the limit as 4GB on my platform with 8GB system memory, it 
> can pass.
> But when run with platform with 16GB system memory, it failed since OOM.
>
> And I guess it also depends on app's behavior.
> I mean some apps  make OS to use more swap space as well.
>
> Thanks
> Roger(Hongbo.He)
> -Original Message-
> From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On 
> Behalf Of He, Roger
> Sent: Thursday, February 01, 2018 1:48 PM
> To: Koenig, Christian ; Zhou, 
> David(ChunMing) ; dri-de...@lists.freedesktop.org
> Cc: linux...@kvack.org; linux-kernel@vger.kernel.org
> Subject: RE: [PATCH] mm/swap: add function get_total_swap_pages to 
> expose total_swap_pages
>
>   But what we could do is to rely on a fixed limit like the Intel driver 
> does and I suggested before.
>   E.g. don't copy anything into a shmemfile when there is only x MB of 
> swap space left.
>
> Here I think we can do it further, let the limit value scaling with total 
> system memory.
> For example: total system memory * 1/2.
> If that it will match the platform configuration better.
>
>   Roger can you test that approach once more with your fix for the OOM 
> issues in the page fault handler?
>
> Sure. Use the limit as total ram*1/2 seems work very well.
> No OOM although swap disk reaches full at peak for piglit test.
> I speculate this case happens but no OOM because:
>
> a. run a while, swap disk be used close to 1/2* total size and but not over 
> 1/2 * total.
> b. all subsequent swapped pages stay in system memory until no space there.
>   Then the swapped pages in shmem be flushed into swap disk. And probably 
> OS also need some swap space.
>   For this case, it is easy to get full for swap disk.
> c. but because now free swap size < 1/2 * total, so no swap out happen  after 
> that.
>  And at least 1/4* system memory will left because below check in 
> ttm_mem_global_reserve will ensure that.
>   if (zone->used_mem > limit)
>   goto out_unlock;
>  
> Thanks
> Roger(Hongbo.He)
> -Original Message-
> From: Koenig, Christian
> Sent: Wednesday, January 31, 2018 4:13 PM
> To: He, Roger ; Zhou, David(ChunMing) 
> ; dri-de...@lists.freedesktop.org
> Cc: linux...@kvack.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] mm/swap: add function get_total_swap_pages to 
> expose total_swap_pages
>
> Yeah, indeed. But what we could do is to rely on a fixed limit like the Intel 
> driver does and I suggested before.
>
> E.g. don't copy anything into a shmemfile when there is only x MB of swap 
> space left.
>
> Roger can you test that approach once more with your fix for the OOM issues 
> in the page fault handler?
>
> Thanks,
> Christian.
>
> Am 31.01.2018 um 09:08 schrieb He, Roger:
>>  I think this patch isn't need at all. You can directly read 
>> total_swap_pages variable in TTM.
>>
>> Because the variable is not exported by EXPORT_SYMBOL_GPL. So direct using 
>> will result in:
>> "WARNING: "total_swap_pages" [drivers/gpu/drm/ttm/ttm.ko] undefined!".
>>
>> Thanks
>> Roger(Hongbo.He)
>> -Original Message-
>> From: dri-devel [mailto:dri

Re: [PATCH] mm/swap: add function get_total_swap_pages to expose total_swap_pages

2018-02-01 Thread Christian König

Can you try to use a fixed limit like I suggested once more?

E.g. just stop swapping if get_nr_swap_pages() < 256MB.

Regards,
Christian.

Am 02.02.2018 um 07:57 schrieb He, Roger:

Use the limit as total ram*1/2 seems work very well.
No OOM although swap disk reaches full at peak for piglit test.

But for this approach, David noticed that has an obvious defect.
For example,  if the platform has 32G system memory, 8G swap disk.
1/2 * ram = 16G which is bigger than swap disk, so no swap for TTM is allowed 
at all.
For now we work out an improved version based on get_nr_swap_pages().
Going to send out later.

Thanks
Roger(Hongbo.He)
-Original Message-
From: He, Roger
Sent: Thursday, February 01, 2018 4:03 PM
To: Koenig, Christian ; Zhou, David(ChunMing) 
; dri-de...@lists.freedesktop.org
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org; 'He, Roger' 

Subject: RE: [PATCH] mm/swap: add function get_total_swap_pages to expose 
total_swap_pages

Just now, I tried with fixed limit.  But not work always.
For example: set the limit as 4GB on my platform with 8GB system memory, it can 
pass.
But when run with platform with 16GB system memory, it failed since OOM.

And I guess it also depends on app's behavior.
I mean some apps  make OS to use more swap space as well.

Thanks
Roger(Hongbo.He)
-Original Message-
From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On Behalf Of 
He, Roger
Sent: Thursday, February 01, 2018 1:48 PM
To: Koenig, Christian ; Zhou, David(ChunMing) 
; dri-de...@lists.freedesktop.org
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org
Subject: RE: [PATCH] mm/swap: add function get_total_swap_pages to expose 
total_swap_pages

But what we could do is to rely on a fixed limit like the Intel driver 
does and I suggested before.
E.g. don't copy anything into a shmemfile when there is only x MB of 
swap space left.

Here I think we can do it further, let the limit value scaling with total 
system memory.
For example: total system memory * 1/2.
If that it will match the platform configuration better.

Roger can you test that approach once more with your fix for the OOM 
issues in the page fault handler?

Sure. Use the limit as total ram*1/2 seems work very well.
No OOM although swap disk reaches full at peak for piglit test.
I speculate this case happens but no OOM because:

a. run a while, swap disk be used close to 1/2* total size and but not over 1/2 
* total.
b. all subsequent swapped pages stay in system memory until no space there.
  Then the swapped pages in shmem be flushed into swap disk. And probably 
OS also need some swap space.
  For this case, it is easy to get full for swap disk.
c. but because now free swap size < 1/2 * total, so no swap out happen  after 
that.
 And at least 1/4* system memory will left because below check in 
ttm_mem_global_reserve will ensure that.
if (zone->used_mem > limit)
goto out_unlock;
 
Thanks

Roger(Hongbo.He)
-Original Message-
From: Koenig, Christian
Sent: Wednesday, January 31, 2018 4:13 PM
To: He, Roger ; Zhou, David(ChunMing) ; 
dri-de...@lists.freedesktop.org
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/swap: add function get_total_swap_pages to expose 
total_swap_pages

Yeah, indeed. But what we could do is to rely on a fixed limit like the Intel 
driver does and I suggested before.

E.g. don't copy anything into a shmemfile when there is only x MB of swap space 
left.

Roger can you test that approach once more with your fix for the OOM issues in 
the page fault handler?

Thanks,
Christian.

Am 31.01.2018 um 09:08 schrieb He, Roger:

I think this patch isn't need at all. You can directly read 
total_swap_pages variable in TTM.

Because the variable is not exported by EXPORT_SYMBOL_GPL. So direct using will 
result in:
"WARNING: "total_swap_pages" [drivers/gpu/drm/ttm/ttm.ko] undefined!".

Thanks
Roger(Hongbo.He)
-Original Message-
From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On
Behalf Of Chunming Zhou
Sent: Wednesday, January 31, 2018 3:15 PM
To: He, Roger ; dri-de...@lists.freedesktop.org
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org; Koenig,
Christian 
Subject: Re: [PATCH] mm/swap: add function get_total_swap_pages to
expose total_swap_pages

Hi Roger,

I think this patch isn't need at all. You can directly read total_swap_pages 
variable in TTM. See the comment:

/* protected with swap_lock. reading in vm_swap_full() doesn't need
lock */ long total_swap_pages;

there are many places using it directly, you just couldn't change its value. 
Reading it doesn't need lock.


Regards,

David Zhou


On 2018年01月29日 16:29, Roger He wrote:

ttm module needs it to determine its internal parameter setting.

Signed-off-by: Roger He 
---
include/linux/swap.h |  6 ++
mm/swapfile.c| 15 +++
2 files changed, 21 insertions(+)

drivers/infiniband/hw/bnxt_re/ib_verbs.c:3315:2: note: in expansion of macro 'if'

2018-02-01 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   4bf772b14675411a69b3c807f73006de0fe4b649
commit: 872f3578241d7e648b3bfcf6451a55faf97ce2e9 RDMA/bnxt_re: Add support for 
MRs with Huge pages
date:   2 weeks ago
config: i386-randconfig-sb0-02021411 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
git checkout 872f3578241d7e648b3bfcf6451a55faf97ce2e9
# save the attached .config to linux build tree
make ARCH=i386 

All warnings (new ones prefixed by >>):

   drivers/infiniband/hw/bnxt_re/ib_verbs.c: In function 'bnxt_re_query_device':
   drivers/infiniband/hw/bnxt_re/ib_verbs.c:149:2: warning: left shift count >= 
width of type
 ib_attr->max_mr_size = BNXT_RE_MAX_MR_SIZE;
 ^
   drivers/infiniband/hw/bnxt_re/ib_verbs.c: In function 'bnxt_re_reg_user_mr':
   drivers/infiniband/hw/bnxt_re/ib_verbs.c:3315:2: warning: left shift count 
>= width of type
 if (length > BNXT_RE_MAX_MR_SIZE) {
 ^
   drivers/infiniband/hw/bnxt_re/ib_verbs.c:3315:2: warning: left shift count 
>= width of type
   In file included from include/linux/kernel.h:10:0,
from include/linux/interrupt.h:6,
from drivers/infiniband/hw/bnxt_re/ib_verbs.c:39:
>> include/linux/compiler.h:61:17: warning: left shift count >= width of type
  static struct ftrace_branch_data   \
^
   include/linux/compiler.h:56:23: note: in expansion of macro '__trace_if'
#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
  ^
>> drivers/infiniband/hw/bnxt_re/ib_verbs.c:3315:2: note: in expansion of macro 
>> 'if'
 if (length > BNXT_RE_MAX_MR_SIZE) {
 ^
   drivers/infiniband/hw/bnxt_re/ib_verbs.c:3317:4: warning: left shift count 
>= width of type
   length, BNXT_RE_MAX_MR_SIZE);
   ^
--
   drivers/infiniband//hw/bnxt_re/ib_verbs.c: In function 
'bnxt_re_query_device':
   drivers/infiniband//hw/bnxt_re/ib_verbs.c:149:2: warning: left shift count 
>= width of type
 ib_attr->max_mr_size = BNXT_RE_MAX_MR_SIZE;
 ^
   drivers/infiniband//hw/bnxt_re/ib_verbs.c: In function 'bnxt_re_reg_user_mr':
   drivers/infiniband//hw/bnxt_re/ib_verbs.c:3315:2: warning: left shift count 
>= width of type
 if (length > BNXT_RE_MAX_MR_SIZE) {
 ^
   drivers/infiniband//hw/bnxt_re/ib_verbs.c:3315:2: warning: left shift count 
>= width of type
   In file included from include/linux/kernel.h:10:0,
from include/linux/interrupt.h:6,
from drivers/infiniband//hw/bnxt_re/ib_verbs.c:39:
>> include/linux/compiler.h:61:17: warning: left shift count >= width of type
  static struct ftrace_branch_data   \
^
   include/linux/compiler.h:56:23: note: in expansion of macro '__trace_if'
#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
  ^
   drivers/infiniband//hw/bnxt_re/ib_verbs.c:3315:2: note: in expansion of 
macro 'if'
 if (length > BNXT_RE_MAX_MR_SIZE) {
 ^
   drivers/infiniband//hw/bnxt_re/ib_verbs.c:3317:4: warning: left shift count 
>= width of type
   length, BNXT_RE_MAX_MR_SIZE);
   ^

vim +/if +3315 drivers/infiniband/hw/bnxt_re/ib_verbs.c

872f35782 Somnath Kotur   2018-01-11  3302  
1ac5a4047 Selvin Xavier   2017-02-10  3303  /* uverbs */
1ac5a4047 Selvin Xavier   2017-02-10  3304  struct ib_mr 
*bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length,
1ac5a4047 Selvin Xavier   2017-02-10  3305u64 
virt_addr, int mr_access_flags,
1ac5a4047 Selvin Xavier   2017-02-10  3306
struct ib_udata *udata)
1ac5a4047 Selvin Xavier   2017-02-10  3307  {
1ac5a4047 Selvin Xavier   2017-02-10  3308  struct bnxt_re_pd *pd = 
container_of(ib_pd, struct bnxt_re_pd, ib_pd);
1ac5a4047 Selvin Xavier   2017-02-10  3309  struct bnxt_re_dev *rdev = 
pd->rdev;
1ac5a4047 Selvin Xavier   2017-02-10  3310  struct bnxt_re_mr *mr;
1ac5a4047 Selvin Xavier   2017-02-10  3311  struct ib_umem *umem;
872f35782 Somnath Kotur   2018-01-11  3312  u64 *pbl_tbl = NULL;
872f35782 Somnath Kotur   2018-01-11  3313  int umem_pgs, page_shift, rc;
1ac5a4047 Selvin Xavier   2017-02-10  3314  
58d4a671d Selvin Xavier   2017-06-29 @3315  if (length > 
BNXT_RE_MAX_MR_SIZE) {
58d4a671d Selvin Xavier   2017-06-29  3316  
dev_err(rdev_to_dev(rdev), "MR Size: %lld > Max supported:%ld\n",
58d4a671d Selvin Xavier   2017-06-29  3317  length, 
BNXT_RE_MAX_MR_SIZE);
58d4a671d Selvin Xavier   2017-06-29  3318  return ERR_PTR(-ENOMEM);
58d4a671d Selvin Xavier   2017-06-29  3319  }
58d4a671d Selvin Xavier   2017-06-29  3320  
1ac5a4047 Selvin Xavier   2017-02-10  3321  mr = kzalloc(sizeof(*mr), 
GFP_KERNEL);
1ac5a4047 Selvin Xavier   2017-02-10  3322  if (!mr)
1ac5a4047 Selvin Xavier   2017-02-10  3323  return ERR_PTR(-ENOMEM);
1ac5a4047 Selvin Xav

Re: [git pull] drm pull for v4.16-rc1

2018-02-01 Thread Daniel Stone
On 2 February 2018 at 02:50, Dave Airlie  wrote:
> On 2 February 2018 at 12:44, Linus Torvalds
>  wrote:
>> On Thu, Feb 1, 2018 at 6:22 PM, Dave Airlie  wrote:
>>>
>>> Turned out I was running on wayland instead of X.org and my cut-n-paste from
>>> gedit to firefox got truncated, wierd. I'll go annoy some people, and make 
>>> sure
>>> it doesn't happen again.
>>
>> Heh, so there's some Wayland clipboard buffer limit.
>
> Yup or some bug getting the second chunks across from one place to another.

The transfer part of Wayland's clipboard protocol is an FD between
both clients for them to send data directly. But Firefox isn't yet
native, and I can fully believe that GNOME Shell's Xwayland clipboard
translator isn't perfect.

>> But that reminds me: is there any *standard* tool to programmatically
>> feed into the clipboard?
>>
>> I occasionally do things like
>>
>> git shortlog A..B | xsel
>>
>> in order to then paste it into some browser window or other.
>>
>> And sure, that works well. But I do it seldom enough that I never
>> remember the command, and half the time it's not even installed
>> because I've switched machines or something, and xsel is always some
>> add-on.
>>
>> What's the thing "real" X people do/use?
>
> I use gedit to move things from files to clip now, for mostly the same 
> reasons,
> I know it's installed usually. xclip and xsel are two utilities I know
> off, but I don'
> think anything gets installed by default.

That's the state of the art for X11.

Cheers,
Daniel


Re: [RFC PATCH 1/9] media: add request API core and UAPI

2018-02-01 Thread Tomasz Figa
On Fri, Feb 2, 2018 at 4:33 PM, Sakari Ailus
 wrote:
>> >> +/**
>> >> + * struct media_request_queue - queue of requests
>> >> + *
>> >> + * @mdev:media_device that manages this queue
>> >> + * @ops: implementation of the queue
>> >> + * @mutex:   protects requests, active_request, req_id, and all members 
>> >> of
>> >> + *   struct media_request
>> >> + * @active_request: request being currently run by this queue
>> >> + * @requests:list of requests (not in any particular order) that 
>> >> this
>> >> + *   queue owns.
>> >> + * @req_id:  counter used to identify requests for debugging purposes
>> >> + */
>> >> +struct media_request_queue {
>> >> + struct media_device *mdev;
>> >> + const struct media_request_queue_ops *ops;
>> >> +
>> >> + struct mutex mutex;
>> >
>> > Any particular reason for using a mutex? The request queue lock will need
>> > to be acquired from interrupts, too, so this should be changed to a
>> > spinlock.
>>
>> Will it be acquired from interrupts? In any case it should be possible
>> to change this to a spinlock.
>
> Using mutexes will effectively make this impossible, and I don't think we
> can safely say there's not going to be a need for that. So spinlocks,
> please.
>

IMHO whether a mutex or spinlock is the right thing depends on what
kind of critical section it is used for. If it only protects data (and
according to the comment, this one seems to do so), spinlock might
actually have better properties, e.g. not introducing the need to
reschedule, if another CPU is accessing the data at the moment. It
might also depend on how heavy the data accesses are, though. We
shouldn't need to spin for too long time.

Best regards,
Tomasz


Re: [PATCH] esp4: remove redundant initialization of pointer esph

2018-02-01 Thread Steffen Klassert
On Tue, Jan 30, 2018 at 02:53:48PM +, Colin King wrote:
> From: Colin Ian King 
> 
> Pointer esph is being assigned a value that is never read, esph is
> re-assigned and only read inside an if statement, hence the
> initialization is redundant and can be removed.
> 
> Cleans up clang warning:
> net/ipv4/esp4.c:657:21: warning: Value stored to 'esph' during
> its initialization is never read
> 
> Signed-off-by: Colin Ian King 

I've queued this for ipsec-next, will be applied
after the merge window.


Re: [RFC PATCH 1/9] media: add request API core and UAPI

2018-02-01 Thread Sakari Ailus
Hi Alexandre,

On Tue, Jan 30, 2018 at 01:23:05PM +0900, Alexandre Courbot wrote:
> Hi Sakari, thanks for the review!
> 
> The version you reviewed is not the latest one, but I suppose most of
> your comments still apply.
> 
> On Fri, Jan 26, 2018 at 5:39 PM, Sakari Ailus  wrote:
> > Hi Alexandre,
> >
> > I remember it was discussed that the work after the V4L2 jobs API would
> > continue from the existing request API patches. I see that at least the
> > rather important support for events is missing in this version. Why was it
> > left out?
> 
> Request completion is signaled by polling on the request FD, so we
> don't need to rely on V4L2 events to signal this anymore. If we want
> to signal different kinds of events on requests we could implement a
> more sophisticated event system on top of that, but for our current
> needs polling is sufficient.

Right. This works for now indeed. We will need to revisit this when
requests are moved to the media device in the future.

> 
> What other kind of event besides completion could we want to deliver
> to user-space from a request?
> 
> >
> > I also see that variable size IOCTL argument support is no longer included.
> 
> Do we need this for the request API?

Technically there's no strict need for that now. However when the requests
are moved to the media device (i.e. other device nodes are not needed
anymore), then this is a must.

It was proposed and AFAIR agreed on as well that new media device
IOCTLs would not use reserved fields any longer but rely on variable size
IOCTL arguments instead. This is in line with your request argument struct
having no reserved fields and I don't think we should add them there.

> 
> >
> > On Fri, Dec 15, 2017 at 04:56:17PM +0900, Alexandre Courbot wrote:
> >> The request API provides a way to group buffers and device parameters
> >> into units of work to be queued and executed. This patch introduces the
> >> UAPI and core framework.
> >>
> >> This patch is based on the previous work by Laurent Pinchart. The core
> >> has changed considerably, but the UAPI is mostly untouched.
> >>
> >> Signed-off-by: Alexandre Courbot 
> >> ---
> >>  drivers/media/Makefile   |   3 +-
> >>  drivers/media/media-device.c |   6 +
> >>  drivers/media/media-request.c| 390 
> >> +++
> >>  drivers/media/v4l2-core/v4l2-ioctl.c |   2 +-
> >>  include/media/media-device.h |   3 +
> >>  include/media/media-entity.h |   6 +
> >>  include/media/media-request.h| 269 
> >>  include/uapi/linux/media.h   |  11 +
> >>  8 files changed, 688 insertions(+), 2 deletions(-)
> >>  create mode 100644 drivers/media/media-request.c
> >>  create mode 100644 include/media/media-request.h
> >>
> >> diff --git a/drivers/media/Makefile b/drivers/media/Makefile
> >> index 594b462ddf0e..985d35ec6b29 100644
> >> --- a/drivers/media/Makefile
> >> +++ b/drivers/media/Makefile
> >> @@ -3,7 +3,8 @@
> >>  # Makefile for the kernel multimedia device drivers.
> >>  #
> >>
> >> -media-objs   := media-device.o media-devnode.o media-entity.o
> >> +media-objs   := media-device.o media-devnode.o media-entity.o \
> >> +media-request.o
> >>
> >>  #
> >>  # I2C drivers should come before other drivers, otherwise they'll fail
> >> diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
> >> index e79f72b8b858..045cec7d2de9 100644
> >> --- a/drivers/media/media-device.c
> >> +++ b/drivers/media/media-device.c
> >> @@ -32,6 +32,7 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >> +#include 
> >>
> >>  #ifdef CONFIG_MEDIA_CONTROLLER
> >>
> >> @@ -407,6 +408,7 @@ static const struct media_ioctl_info ioctl_info[] = {
> >>   MEDIA_IOC(ENUM_LINKS, media_device_enum_links, 
> >> MEDIA_IOC_FL_GRAPH_MUTEX),
> >>   MEDIA_IOC(SETUP_LINK, media_device_setup_link, 
> >> MEDIA_IOC_FL_GRAPH_MUTEX),
> >>   MEDIA_IOC(G_TOPOLOGY, media_device_get_topology, 
> >> MEDIA_IOC_FL_GRAPH_MUTEX),
> >> + MEDIA_IOC(REQUEST_CMD, media_device_request_cmd, 0),
> >>  };
> >>
> >>  static long media_device_ioctl(struct file *filp, unsigned int cmd,
> >> @@ -688,6 +690,10 @@ EXPORT_SYMBOL_GPL(media_device_init);
> >>
> >>  void media_device_cleanup(struct media_device *mdev)
> >>  {
> >> + if (mdev->req_queue) {
> >> + mdev->req_queue->ops->release(mdev->req_queue);
> >> + mdev->req_queue = NULL;
> >> + }
> >>   ida_destroy(&mdev->entity_internal_idx);
> >>   mdev->entity_internal_idx_max = 0;
> >>   media_graph_walk_cleanup(&mdev->pm_count_walk);
> >> diff --git a/drivers/media/media-request.c b/drivers/media/media-request.c
> >> new file mode 100644
> >> index ..15dc65ddfe41
> >> --- /dev/null
> >> +++ b/drivers/media/media-request.c
> >> @@ -0,0 +1,390 @@
> >> +/*
> >> + * Request and request queue base management
> >> + *
> >> + * Copyright (C) 2017, The Chromium OS Authors.  All rights reserved.
> >> +

Re: KASAN: stack-out-of-bounds Read in xfrm_state_find (4)

2018-02-01 Thread Steffen Klassert
On Thu, Feb 01, 2018 at 11:30:00AM +0100, Dmitry Vyukov wrote:
> On Thu, Feb 1, 2018 at 9:34 AM, Steffen Klassert
> 
> Hi Steffen,
> 
> Please see the email footer:
> 
> > If you want to test a patch for this bug, please reply with:
> > #syz test: git://repo/address.git branch
> > and provide the patch inline or as an attachment.

Thanks for the hint, I've overlooked this. This is very usefull
for the case that I can not reproduce the bug, but I think I know
how to fix it.

There are two more cases that come to my mind where syzbot could
help.

1. I can not reproduce the bug and I don't know how to fix it,
   but some debug output would be helpfull:

   syz test-debug-patch-and-send-dmesg-output: git://repo/address.git branch

2. I can not reproduce the bug and I have absolutely no idea what it
   could be:

   syz bisect: git://repo/address.git branch commit a commit b

I don't know if this is possible, but it would bring the bugfixing
process a bit coser to the case where a real user does a bug report.


#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master


Subject: [PATCH RFC] xfrm: Refuse to insert 32 bit userspace socket policies on 
64 bit systems

We don't have compat layer for xfrm, so userspace and kernel
structures have different sizes in this case. This results in
a broken confuguration, so refuse to configure socket policies
when trying to insert from 32 bit userspace as we do it already
with policies inserted via netlink.

Reported-by: syzbot+e1a1577ca8bcb47b7...@syzkaller.appspotmail.com
Signed-off-by: Steffen Klassert 
---
 net/xfrm/xfrm_state.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index a3785f538018..25861a4ef872 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -2056,6 +2056,11 @@ int xfrm_user_policy(struct sock *sk, int optname, u8 
__user *optval, int optlen
struct xfrm_mgr *km;
struct xfrm_policy *pol = NULL;
 
+#ifdef CONFIG_COMPAT
+   if (in_compat_syscall())
+   return -EOPNOTSUPP;
+#endif
+
if (optlen <= 0 || optlen > PAGE_SIZE)
return -EMSGSIZE;
 
-- 
2.14.1



Re: [PATCH v11 3/3] mm, x86: display pkey in smaps only if arch supports pkeys

2018-02-01 Thread Ram Pai
On Fri, Feb 02, 2018 at 12:27:27PM +0800, kbuild test robot wrote:
> Hi Ram,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on linus/master]
> [also build test ERROR on v4.15 next-20180201]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_0day-2Dci_linux_commits_Ram-2DPai_mm-2Dx86-2Dpowerpc-2DEnhancements-2Dto-2DMemory-2DProtection-2DKeys_20180202-2D120004&d=DwIBAg&c=jf_iaSHvJObTbx-siA1ZOg&r=m-UrKChQVkZtnPpjbF6YY99NbT8FBByQ-E-ygV8luxw&m=Fv3tEHet1bTUrDjOnzEhXvGM_4tGlkYhJHPBnWNWgVA&s=Z1W6CV2tfPmLYU8lVv1oDRl2cAyQA76KE2P064A2CQY&e=
> config: x86_64-randconfig-x005-201804 (attached as .config)
> compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64 
> 
> All error/warnings (new ones prefixed by >>):
> 
>In file included from arch/x86/include/asm/mmu_context.h:8:0,
> from arch/x86/events/core.c:36:
> >> include/linux/pkeys.h:16:23: error: expected identifier or '(' before 
> >> numeric constant
> #define vma_pkey(vma) 0
>   ^
> >> arch/x86/include/asm/mmu_context.h:298:19: note: in expansion of macro 
> >> 'vma_pkey'
> static inline int vma_pkey(struct vm_area_struct *vma)
>   ^~~~
> 
> vim +16 include/linux/pkeys.h
> 
>  7
>  8#ifdef CONFIG_ARCH_HAS_PKEYS
>  9#include 
> 10#else /* ! CONFIG_ARCH_HAS_PKEYS */
> 11#define arch_max_pkey() (1)
> 12#define execute_only_pkey(mm) (0)
> 13#define arch_override_mprotect_pkey(vma, prot, pkey) (0)
> 14#define PKEY_DEDICATED_EXECUTE_ONLY 0
> 15#define ARCH_VM_PKEY_FLAGS 0
>   > 16#define vma_pkey(vma) 0

Oops. Thanks for catching the issue. The following fix will resolve the error.

diff --git a/arch/x86/include/asm/mmu_context.h
b/arch/x86/include/asm/mmu_context.h
index 6d16d15..c1aeb19 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -238,11 +238,6 @@ static inline int vma_pkey(struct vm_area_struct
*vma)
 
return (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT;
}
-#else
-static inline int vma_pkey(struct vm_area_struct *vma)
-{
-   return 0;
-}
 #endif

RP



Re: [PATCH v2] ACPI / tables: Add IORT to injectable table list

2018-02-01 Thread Yang, Shunyong
Hi, Hanjun

On Wed, 2018-01-31 at 21:32 +0800, Hanjun Guo wrote:
> Hi Shunyong,
> 
> On 2018/1/30 9:44, Yang, Shunyong wrote:
> > 
> > Hi, Rafael
> > 
> > Could you please help to review this patch? This is a small change
> > to
> > add ACPI_SIG_IORT to table_sigs[]. 
> > Loading IORT table from initrd is very useful to debug SMMU
> > node/device
> > probe, MSI allocation, stream id translation and verifying IORT
> > table
> > from firmware. So, I add this.
> It's true, mappings in IORT will be easy getting wrong, so it would
> be
> good to test it without updating the firmware.
> 
> But I think you'd better to add your comment about why you need
> IORT in the commit message in your patch, that will be useful
> to convince Rafael to take your patch.
> 

Thanks for your suggestion. I will add detailed information to commit
message and send out v3 later.

Thanks.
Shunyong.


Re: [PATCH] ARM: dts: imx6q-bx50v3: Enable secure-reg-access

2018-02-01 Thread Shawn Guo
+ Frank

On Mon, Jan 15, 2018 at 05:07:22PM +0100, Sebastian Reichel wrote:
> From: Peter Senna Tschudin 
> 
> Add secure-reg-access on device tree include file for Bx50 devices
> to enable PMU and hardware counters for perf.
> 
> Signed-off-by: Peter Senna Tschudin 
> Signed-off-by: Sebastian Reichel 
> ---
>  arch/arm/boot/dts/imx6q-bx50v3.dtsi | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/imx6q-bx50v3.dtsi 
> b/arch/arm/boot/dts/imx6q-bx50v3.dtsi
> index 86cfd4481e72..ccaaee83e2fa 100644
> --- a/arch/arm/boot/dts/imx6q-bx50v3.dtsi
> +++ b/arch/arm/boot/dts/imx6q-bx50v3.dtsi
> @@ -43,6 +43,13 @@
>  #include "imx6q-ba16.dtsi"
>  
>  / {
> + soc {
> + pmu {
> + compatible = "arm,cortex-a9-pmu";
> + secure-reg-access;

I'm not sure this could be a board level configuration.  Shouldn't this
property just be added into pmu node in imx6qdl.dtsi?

Shawn

> + };
> + };
> +
>   clocks {
>   mclk: clock@0 {
>   compatible = "fixed-clock";
> -- 
> 2.15.1
> 


Re: [PATCH v4 02/10] ufs: sysfs: device descriptor

2018-02-01 Thread gre...@linuxfoundation.org
On Fri, Feb 02, 2018 at 12:25:46AM +, Bart Van Assche wrote:
> On Thu, 2018-02-01 at 18:15 +0200, Stanislav Nijnikov wrote:
> > +enum ufs_desc_param_size {
> > +   UFS_PARAM_BYTE_SIZE = 1,
> > +   UFS_PARAM_WORD_SIZE = 2,
> > +   UFS_PARAM_DWORD_SIZE= 4,
> > +   UFS_PARAM_QWORD_SIZE= 8,
> > +};
> 
> Please do not copy bad naming choices from the Windows kernel into the Linux
> kernel. Using names like WORD / DWORD / QWORD is much less readable than using
> the numeric constants 2, 4, 8. Hence my proposal to leave out the above enum
> completely.

Are you sure those do not come from the spec itself?  It's been a while
since I last read it, but for some reason I remember those types of
names being in there.  But I might be confusing specs here.

thanks,

greg k-h


Re: [PATCH] ARM: dts: imx6q-bx50v3: disable SD card (usdhc2)

2018-02-01 Thread Shawn Guo
On Mon, Jan 15, 2018 at 03:44:24PM +0100, Sebastian Reichel wrote:
> From: Ian Ray 
> 
> Disable the SD card interface from devicetree.
> 
> Signed-off-by: Ian Ray 
> Signed-off-by: Sebastian Reichel 

I applied the patch [1] from Ian.

Shawn

[1 ]https://www.spinics.net/lists/devicetree/msg209294.html

> ---
>  arch/arm/boot/dts/imx6q-bx50v3.dtsi | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/imx6q-bx50v3.dtsi 
> b/arch/arm/boot/dts/imx6q-bx50v3.dtsi
> index aefce581c0c3..86cfd4481e72 100644
> --- a/arch/arm/boot/dts/imx6q-bx50v3.dtsi
> +++ b/arch/arm/boot/dts/imx6q-bx50v3.dtsi
> @@ -328,6 +328,10 @@
>   };
>  };
>  
> +&usdhc2 {
> + status = "disabled";
> +};
> +
>  &usdhc4 {
>   pinctrl-names = "default";
>   pinctrl-0 = <&pinctrl_usdhc4>;
> -- 
> 2.15.1
> 


[PATCH 2/2] KVM: X86: Add per-VM no-HLT-exiting capability

2018-02-01 Thread Wanpeng Li
From: Wanpeng Li 

If host CPUs are dedicated to a VM, we can avoid VM exits on HLT.
This patch adds the per-VM non-HLT-exiting capability.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 Documentation/virtual/kvm/api.txt | 11 +++
 arch/x86/include/asm/kvm_host.h   |  2 ++
 arch/x86/kvm/vmx.c| 21 +
 arch/x86/kvm/x86.c|  5 +
 arch/x86/kvm/x86.h|  5 +
 include/uapi/linux/kvm.h  |  1 +
 6 files changed, 45 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index e5f1743..573a3e5 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4222,6 +4222,17 @@ enables QEMU to build error log and branch to guest 
kernel registered
 machine check handling routine. Without this capability KVM will
 branch to guests' 0x200 interrupt vector.
 
+7.13 KVM_CAP_X86_GUEST_HLT
+
+Architectures: x86
+Parameters: none
+Returns: 0 on success
+
+This capability indicates that a guest using HLT to stop a virtual CPU
+will not cause a VM exit. As such, time spent while a virtual CPU is
+halted in this way will then be accounted for as guest running time on
+the host, KVM_FEATURE_PV_UNHALT should be disabled.
+
 8. Other capabilities.
 --
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index dd6f57a..c566ea0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -804,6 +804,8 @@ struct kvm_arch {
 
gpa_t wall_clock;
 
+   bool hlt_in_guest;
+
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b1e554a..6cfd8d3 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2478,6 +2478,19 @@ static int nested_vmx_check_exception(struct kvm_vcpu 
*vcpu, unsigned long *exit
return 0;
 }
 
+static void vmx_clear_hlt(struct kvm_vcpu *vcpu)
+{
+   /*
+* Ensure that we clear the HLT state in the VMCS.  We don't need to
+* explicitly skip the instruction because if the HLT state is set,
+* then the instruction is already executing and RIP has already been
+* advanced.
+*/
+   if (kvm_hlt_in_guest(vcpu->kvm) &&
+   vmcs_read32(GUEST_ACTIVITY_STATE) == GUEST_ACTIVITY_HLT)
+   vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
+}
+
 static void vmx_queue_exception(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -2508,6 +2521,8 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
intr_info |= INTR_TYPE_HARD_EXCEPTION;
 
vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr_info);
+
+   vmx_clear_hlt(vcpu);
 }
 
 static bool vmx_rdtscp_supported(void)
@@ -5301,6 +5316,8 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
exec_control |= CPU_BASED_CR3_STORE_EXITING |
CPU_BASED_CR3_LOAD_EXITING  |
CPU_BASED_INVLPG_EXITING;
+   if (kvm_hlt_in_guest(vmx->vcpu.kvm))
+   exec_control &= ~CPU_BASED_HLT_EXITING;
return exec_control;
 }
 
@@ -5729,6 +5746,8 @@ static void vmx_inject_irq(struct kvm_vcpu *vcpu)
} else
intr |= INTR_TYPE_EXT_INTR;
vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, intr);
+
+   vmx_clear_hlt(vcpu);
 }
 
 static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
@@ -5759,6 +5778,8 @@ static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
 
vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR);
+
+   vmx_clear_hlt(vcpu);
 }
 
 static bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c13cd14..a508247 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2740,6 +2740,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
ext)
case KVM_CAP_SET_BOOT_CPU_ID:
case KVM_CAP_SPLIT_IRQCHIP:
case KVM_CAP_IMMEDIATE_EXIT:
+   case KVM_CAP_X86_GUEST_HLT:
r = 1;
break;
case KVM_CAP_ADJUST_CLOCK:
@@ -4061,6 +4062,10 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 
r = 0;
break;
+   case KVM_CAP_X86_GUEST_HLT:
+   kvm->arch.hlt_in_guest = cap->args[0];
+   r = 0;
+   break;
default:
r = -EINVAL;
break;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index b91215d..96fe84e 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -270,4 +270,9 @@ static inline bool kvm_mwait_in_guest(void)
!boot_cpu_has_bug(X86_BUG_MONITOR);
 }
 
+static inline bool kvm_hlt_in_guest(struct kvm *kvm)
+{
+   return kvm->arch.hlt_in_guest;
+}
+
 #endif
diff --git a/include/uapi/linux/kvm

[PATCH 1/2] KVM: X86: Add dedicated pCPU hint PV_DEDICATED

2018-02-01 Thread Wanpeng Li
From: Wanpeng Li 

Waiman Long mentioned that:

 Generally speaking, unfair lock performs well for VMs with a small
 number of vCPUs. Native qspinlock may perform better than pvqspinlock
 if there is vCPU pinning and there is no vCPU over-commitment.

This patch adds a performance hint to allow hypervisor admin to choose
the qspinlock to be used when a dedicated pCPU is available.

PV_DEDICATED = 1, PV_UNHALT = anything: default is qspinlock
PV_DEDICATED = 0, PV_UNHALT = 1: default is Hybrid PV queued/unfair lock
PV_DEDICATED = 0, PV_UNHALT = 0: default is tas

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Wanpeng Li 
---
 Documentation/virtual/kvm/cpuid.txt  | 6 ++
 arch/x86/include/uapi/asm/kvm_para.h | 1 +
 arch/x86/kernel/kvm.c| 6 ++
 3 files changed, 13 insertions(+)

diff --git a/Documentation/virtual/kvm/cpuid.txt 
b/Documentation/virtual/kvm/cpuid.txt
index 87a7506..c0740b1 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -54,6 +54,12 @@ KVM_FEATURE_PV_UNHALT  || 7 || guest checks 
this feature bit
||   || before enabling paravirtualized
||   || spinlock support.
 --
+KVM_FEATURE_PV_DEDICATED   || 8 || guest checks this feature bit
+   ||   || to determine if they run on
+   ||   || dedicated vCPUs, allowing opti-
+   ||   || mizations such as usage of
+   ||   || qspinlocks.
+--
 KVM_FEATURE_PV_TLB_FLUSH   || 9 || guest checks this feature bit
||   || before enabling paravirtualized
||   || tlb flush.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 6cfa9c8..9a5ef67 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -25,6 +25,7 @@
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
 #define KVM_FEATURE_PV_UNHALT  7
+#define KVM_FEATURE_PV_DEDICATED   8
 #define KVM_FEATURE_PV_TLB_FLUSH   9
 #define KVM_FEATURE_ASYNC_PF_VMEXIT10
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index aa2b706..6f0e43f 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -726,6 +726,12 @@ void __init kvm_spinlock_init(void)
 {
if (!kvm_para_available())
return;
+
+   if (kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED)) {
+   static_branch_disable(&virt_spin_lock_key);
+   return;
+   }
+
/* Does host kernel support KVM_FEATURE_PV_UNHALT? */
if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
return;
-- 
2.7.4



[PATCH 0/2] KVM: X86: Add dedicated pCPU hint and per-VM non-HLT-exiting capability

2018-02-01 Thread Wanpeng Li
Waiman Long mentioned that:

 Generally speaking, unfair lock performs well for VMs with a small
 number of vCPUs. Native qspinlock may perform better than pvqspinlock
 if there is vCPU pinning and there is no vCPU over-commitment.

This patchset adds a PV_DEDICATED performance hint to allow hypervisor 
admin to choose the qspinlock to be used when a dedicated pCPU is 
available.

In addition, according to the original hlt in VMX non-root mode discussion, 
https://www.spinics.net/lists/kvm/msg152397.html This patchset also 
adds the per-VM non-HLT-exiting capability to further improve performance 
under the dedicated pCPU scenarios.

Wanpeng Li (2):
  KVM: X86: Add dedicated pCPU hint PV_DEDICATED
  KVM: X86: Add per-VM no-HLT-exiting capability

 Documentation/virtual/kvm/api.txt| 11 +++
 Documentation/virtual/kvm/cpuid.txt  |  6 ++
 arch/x86/include/asm/kvm_host.h  |  2 ++
 arch/x86/include/uapi/asm/kvm_para.h |  1 +
 arch/x86/kernel/kvm.c|  6 ++
 arch/x86/kvm/vmx.c   | 21 +
 arch/x86/kvm/x86.c   |  5 +
 arch/x86/kvm/x86.h   |  5 +
 include/uapi/linux/kvm.h |  1 +
 9 files changed, 58 insertions(+)

-- 
2.7.4



[PATCH 5/6] nvme-pci: discard wait timeout when delete cq/sq

2018-02-01 Thread Jianchao Wang
Currently, nvme_disable_io_queues could be wakeup by both request
completion and wait timeout path. This is unnecessary and could
introduce race between nvme_dev_disable and request timeout path.
When delete cq/sq command expires, the nvme_disable_io_queues will
also be wakeup and return to nvme_dev_disable, then handle the
outstanding requests. This will race with the request timeout path.

To fix it, just use wait_for_completion instead of the timeout one.
The request timeout path will wakeup it.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 5b192b0..a838713c 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2048,7 +2048,6 @@ static int nvme_delete_queue(struct nvme_queue *nvmeq, u8 
opcode)
 static void nvme_disable_io_queues(struct nvme_dev *dev)
 {
int pass, queues = dev->online_queues - 1;
-   unsigned long timeout;
u8 opcode = nvme_admin_delete_sq;
 
for (pass = 0; pass < 2; pass++) {
@@ -2056,15 +2055,12 @@ static void nvme_disable_io_queues(struct nvme_dev *dev)
 
reinit_completion(&dev->ioq_wait);
  retry:
-   timeout = ADMIN_TIMEOUT;
for (; i > 0; i--, sent++)
if (nvme_delete_queue(&dev->queues[i], opcode))
break;
 
while (sent--) {
-   timeout = 
wait_for_completion_io_timeout(&dev->ioq_wait, timeout);
-   if (timeout == 0)
-   return;
+   wait_for_completion(&dev->ioq_wait);
if (i)
goto retry;
}
-- 
2.7.4



[PATCH 3/6] blk-mq: make blk_mq_rq_update_aborted_gstate a external interface

2018-02-01 Thread Jianchao Wang
No functional change, just make blk_mq_rq_update_aborted_gstate a
external interface.

Signed-off-by: Jianchao Wang 
---
 block/blk-mq.c | 3 ++-
 include/linux/blk-mq.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 01f271d..a027ca2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -581,7 +581,7 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int 
*srcu_idx)
*srcu_idx = srcu_read_lock(hctx->srcu);
 }
 
-static void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate)
+void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate)
 {
unsigned long flags;
 
@@ -597,6 +597,7 @@ static void blk_mq_rq_update_aborted_gstate(struct request 
*rq, u64 gstate)
u64_stats_update_end(&rq->aborted_gstate_sync);
local_irq_restore(flags);
 }
+EXPORT_SYMBOL(blk_mq_rq_update_aborted_gstate);
 
 static u64 blk_mq_rq_aborted_gstate(struct request *rq)
 {
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8efcf49..ad54024 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -257,6 +257,7 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool 
at_head,
 void blk_mq_kick_requeue_list(struct request_queue *q);
 void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long 
msecs);
 void blk_mq_complete_request(struct request *rq);
+void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate);
 
 bool blk_mq_queue_stopped(struct request_queue *q);
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx);
-- 
2.7.4



[PATCH 1/6] nvme-pci: move clearing host mem behind stopping queues

2018-02-01 Thread Jianchao Wang
Move clearing host mem behind stopping queues. Prepare for
following patch which will grab all the outstanding requests.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 6fe7af0..00cffed 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2186,7 +2186,10 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
if (!dead) {
if (shutdown)
nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
+   }
+   nvme_stop_queues(&dev->ctrl);
 
+   if (!dead) {
/*
 * If the controller is still alive tell it to stop using the
 * host memory buffer.  In theory the shutdown / reset should
@@ -2195,11 +2198,6 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
 */
if (dev->host_mem_descs)
nvme_set_host_mem(dev, 0);
-
-   }
-   nvme_stop_queues(&dev->ctrl);
-
-   if (!dead) {
nvme_disable_io_queues(dev);
nvme_disable_admin_queue(dev, shutdown);
}
-- 
2.7.4



[PATCH 4/6] nvme-pci: break up nvme_timeout and nvme_dev_disable

2018-02-01 Thread Jianchao Wang
Currently, the complicated relationship between nvme_dev_disable
and nvme_timeout has become a devil that will introduce many
circular pattern which may trigger deadlock or IO hang. Let's
enumerate the tangles between them:
 - nvme_timeout has to invoke nvme_dev_disable to stop the
   controller doing DMA access before free the request.
 - nvme_dev_disable has to depend on nvme_timeout to complete
   adminq requests to set HMB or delete sq/cq when the controller
   has no response.
 - nvme_dev_disable will race with nvme_timeout when cancels the
   outstanding requests.

To break up them, let's first look at what's kind of requests
nvme_timeout has to handle.

RESETTING  previous adminq/IOq request
or shutdownadminq requests from nvme_dev_disable

RECONNECTING   adminq requests from nvme_reset_work

nvme_timeout has to invoke nvme_dev_disable first before complete
all the expired request above. We avoid this as following.

For the previous adminq/IOq request:
use blk_abort_request to force all the outstanding requests expired
in nvme_dev_disable. In nvme_timeout, set NVME_REQ_CANCELLED and
return BLK_EH_NOT_HANDLED. Then the request will not be completed and
freed. We needn't invoke nvme_dev_disable any more.

blk_abort_request is safe when race with irq completion path.
we have been able to grab all the outstanding requests. This could
eliminate the race between nvme_timeout and nvme_dev_disable.

We use NVME_REQ_CANCELLED to identify them. After the controller is
totally disabled/shutdown, we invoke blk_mq_rq_update_aborted_gstate
to clear requests and invoke blk_mq_complete_request to complete them.

In addition, to identify the previous adminq/IOq request and adminq
requests from nvme_dev_disable, we introduce NVME_PCI_OUTSTANDING_GRABBING
and NVME_PCI_OUTSTANDING_GRABBED to let nvme_timeout be able to
distinguish them.

For the adminq requests from nvme_dev_disable/nvme_reset_work:
invoke nvme_disable_ctrl directly, then set NVME_REQ_CANCELLED and
return BLK_EH_HANDLED. nvme_dev_disable/nvme_reset_work will
see the error.

With this patch, we could avoid nvme_dev_disable to be invoked
by nvme_timeout and eliminate the race between nvme_timeout and
nvme_dev_disable on outstanding requests.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 146 
 1 file changed, 123 insertions(+), 23 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a7fa397..5b192b0 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -70,6 +70,8 @@ struct nvme_queue;
 
 static void nvme_process_cq(struct nvme_queue *nvmeq);
 static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
+#define NVME_PCI_OUTSTANDING_GRABBING 1
+#define NVME_PCI_OUTSTANDING_GRABBED 2
 
 /*
  * Represents an NVM Express device.  Each nvme_dev is a PCI function.
@@ -80,6 +82,7 @@ struct nvme_dev {
struct blk_mq_tag_set admin_tagset;
u32 __iomem *dbs;
struct device *dev;
+   int grab_flag;
struct dma_pool *prp_page_pool;
struct dma_pool *prp_small_pool;
unsigned online_queues;
@@ -1130,6 +1133,23 @@ static void abort_endio(struct request *req, 
blk_status_t error)
blk_mq_free_request(req);
 }
 
+static void nvme_pci_disable_ctrl_directly(struct nvme_dev *dev)
+{
+   struct pci_dev *pdev = to_pci_dev(dev->dev);
+   u32 csts;
+   bool dead;
+
+   if (!pci_is_enabled(pdev))
+   return;
+
+   csts = readl(dev->bar + NVME_REG_CSTS);
+   dead = !!((csts & NVME_CSTS_CFS) ||
+   !(csts & NVME_CSTS_RDY) ||
+   pdev->error_state  != pci_channel_io_normal);
+   if (!dead)
+   nvme_disable_ctrl(&dev->ctrl, dev->ctrl.cap);
+}
+
 static bool nvme_should_reset(struct nvme_dev *dev, u32 csts)
 {
 
@@ -1191,12 +1211,13 @@ static enum blk_eh_timer_return nvme_timeout(struct 
request *req, bool reserved)
 
/*
 * Reset immediately if the controller is failed
+* nvme_dev_disable will take over the expired requests.
 */
if (nvme_should_reset(dev, csts)) {
+   nvme_req(req)->flags |= NVME_REQ_CANCELLED;
nvme_warn_reset(dev, csts);
-   nvme_dev_disable(dev, false);
nvme_reset_ctrl(&dev->ctrl);
-   return BLK_EH_HANDLED;
+   return BLK_EH_NOT_HANDLED;
}
 
/*
@@ -1210,38 +1231,51 @@ static enum blk_eh_timer_return nvme_timeout(struct 
request *req, bool reserved)
}
 
/*
-* Shutdown immediately if controller times out while starting. The
-* reset work will see the pci device disabled when it gets the forced
-* cancellation error. All outstanding requests are completed on
-* shutdown, so we return BLK_EH_HANDLED.
+* The previous outstanding requests on adminq and ioq have been
+* grabbed or drained for RECON

[PATCH 0/6]nvme-pci: fixes on nvme_timeout and nvme_dev_disable

2018-02-01 Thread Jianchao Wang
Hi Christoph, Keith and Sagi

Please consider and comment on the following patchset.
That's really appreciated.

There is a complicated relationship between nvme_timeout and nvme_dev_disable.
 - nvme_timeout has to invoke nvme_dev_disable to stop the
   controller doing DMA access before free the request.
 - nvme_dev_disable has to depend on nvme_timeout to complete
   adminq requests to set HMB or delete sq/cq when the controller
   has no response.
 - nvme_dev_disable will race with nvme_timeout when cancels the
   outstanding requests.
We have found some issues introduced by them, please refer the following link

http://lists.infradead.org/pipermail/linux-nvme/2018-January/015053.html 
http://lists.infradead.org/pipermail/linux-nvme/2018-January/015276.html
http://lists.infradead.org/pipermail/linux-nvme/2018-January/015328.html
Even we cannot ensure there is no other issue.

The best way to fix them is to break up the relationship between them.
With this patch, we could avoid nvme_dev_disable to be invoked
by nvme_timeout and eliminate the race between nvme_timeout and
nvme_dev_disable on outstanding requests.


There are 6 patches:

1st ~ 3th patches does some preparation for the 4th one.
4th is to avoid nvme_dev_disable to be invoked by nvme_timeout, and implement
the synchronization between them. More details, please refer to the comment of
this patch.
5th fixes a bug after 4th patch is introduced. It let nvme_delete_io_queues can
only be wakeup by completion path.
6th fixes a bug found when test, it is not related with 4th patch.

This patchset was tested under debug patch for some days.
And some bugfix have been done.
The debug patch and other patches are available in following it branch:
https://github.com/jianchwa/linux-blcok.git nvme_fixes_test

Jianchao Wang (6)
0001-nvme-pci-move-clearing-host-mem-behind-stopping-queu.patch
0002-nvme-pci-fix-the-freeze-and-quiesce-for-shutdown-and.patch
0003-blk-mq-make-blk_mq_rq_update_aborted_gstate-a-extern.patch
0004-nvme-pci-break-up-nvme_timeout-and-nvme_dev_disable.patch
0005-nvme-pci-discard-wait-timeout-when-delete-cq-sq.patch
0006-nvme-pci-suspend-queues-based-on-online_queues.patch

diff stat following:
 block/blk-mq.c  |   3 +-
 drivers/nvme/host/pci.c | 225 
++-
 include/linux/blk-mq.h  |   1 +
 3 files changed, 169 insertions(+), 60 deletions(-)

Thanks
Jianchao



[PATCH 6/6] nvme-pci: suspend queues based on online_queues

2018-02-01 Thread Jianchao Wang
nvme cq irq is freed based on queue_count. When the sq/cq creation
fails, irq will not be setup. free_irq will warn 'Try to free
already-free irq'.

To fix it, we only increase online_queues when adminq/sq/cq are
created and associated irq is setup. Then suspend queues based
on online_queues.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 31 ---
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a838713c..e37f209 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1349,9 +1349,6 @@ static int nvme_suspend_queue(struct nvme_queue *nvmeq)
nvmeq->cq_vector = -1;
spin_unlock_irq(&nvmeq->q_lock);
 
-   if (!nvmeq->qid && nvmeq->dev->ctrl.admin_q)
-   blk_mq_quiesce_queue(nvmeq->dev->ctrl.admin_q);
-
pci_free_irq(to_pci_dev(nvmeq->dev->dev), vector, nvmeq);
 
return 0;
@@ -1495,13 +1492,15 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, 
int qid)
nvme_init_queue(nvmeq, qid);
result = queue_request_irq(nvmeq);
if (result < 0)
-   goto release_sq;
+   goto offline;
 
return result;
 
- release_sq:
+offline:
+   dev->online_queues--;
+release_sq:
adapter_delete_sq(dev, qid);
- release_cq:
+release_cq:
adapter_delete_cq(dev, qid);
return result;
 }
@@ -1641,6 +1640,7 @@ static int nvme_pci_configure_admin_queue(struct nvme_dev 
*dev)
result = queue_request_irq(nvmeq);
if (result) {
nvmeq->cq_vector = -1;
+   dev->online_queues--;
return result;
}
 
@@ -1988,6 +1988,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
result = queue_request_irq(adminq);
if (result) {
adminq->cq_vector = -1;
+   dev->online_queues--;
return result;
}
return nvme_create_io_queues(dev);
@@ -2257,13 +2258,16 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
int i;
bool dead = true;
struct pci_dev *pdev = to_pci_dev(dev->dev);
+   int onlines;
 
mutex_lock(&dev->shutdown_lock);
if (pci_is_enabled(pdev)) {
u32 csts = readl(dev->bar + NVME_REG_CSTS);
 
-   dead = !!((csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY) ||
-   pdev->error_state  != pci_channel_io_normal);
+   dead = !!((csts & NVME_CSTS_CFS) ||
+   !(csts & NVME_CSTS_RDY) ||
+   (pdev->error_state  != pci_channel_io_normal) ||
+   (dev->online_queues == 0));
}
 
/* Just freeze the queue for shutdown case */
@@ -2297,9 +2301,14 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
nvme_disable_io_queues(dev);
nvme_disable_admin_queue(dev, shutdown);
}
-   for (i = dev->ctrl.queue_count - 1; i >= 0; i--)
+
+   onlines = dev->online_queues;
+   for (i = onlines - 1; i >= 0; i--)
nvme_suspend_queue(&dev->queues[i]);
 
+   if (dev->ctrl.admin_q)
+   blk_mq_quiesce_queue(dev->ctrl.admin_q);
+
nvme_pci_disable(dev);
 
blk_mq_tagset_busy_iter(&dev->tagset, nvme_pci_cancel_rq, &dev->ctrl);
@@ -2444,12 +2453,12 @@ static void nvme_reset_work(struct work_struct *work)
 * Keep the controller around but remove all namespaces if we don't have
 * any working I/O queue.
 */
-   if (dev->online_queues < 2) {
+   if (dev->online_queues == 1) {
dev_warn(dev->ctrl.device, "IO queues not created\n");
nvme_kill_queues(&dev->ctrl);
nvme_remove_namespaces(&dev->ctrl);
new_state = NVME_CTRL_ADMIN_ONLY;
-   } else {
+   } else if (dev->online_queues > 1) {
/* hit this only when allocate tagset fails */
if (nvme_dev_add(dev))
new_state = NVME_CTRL_ADMIN_ONLY;
-- 
2.7.4



[PATCH] PCI: Add quirk for Cavium Thunder-X2 PCIe erratum #173

2018-02-01 Thread George Cherian
The PCIe Controller on Cavium ThunderX2 processors does not
respond to downstream CFG/ECFG cycles when root port is
in power management D3-hot state.

In our tests the above mentioned errata causes the following crash when
the downstream endpoint config space is accessed, while root port is in
D3 state.

[   12.775202] Unhandled fault: synchronous external abort (0x96000610) at 
0x
[   12.783453] Internal error: : 96000610 [#1] SMP
[   12.787971] Modules linked in: aes_neon_blk ablk_helper cryptd
[   12.793799] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.8.0-32-generic #34
[   12.800659] Hardware name: Cavium Inc. Unknown/Unknown, BIOS 1.0 01/01/2018
[   12.807607] task: 808f346b8d80 task.stack: 808f346b4000
[   12.813518] PC is at pci_generic_config_read+0x5c/0xf0
[   12.818643] LR is at pci_generic_config_read+0x48/0xf0
[   12.823767] pc : [] lr : [] pstate: 
204000c9
[   12.831148] sp : 808f346b7bf0
[   12.834449] x29: 808f346b7bf0 x28: 08e2b848
[   12.839750] x27: 08dc3070 x26: 08d516c0
[   12.845050] x25: 0040 x24: 0937a480
[   12.850351] x23: 006c x22: 
[   12.855651] x21: 808f346b7c84 x20: 0004
[   12.860951] x19: 808f31076000 x18: 
[   12.866251] x17: 1b3613e6 x16: 7f330457
[   12.871551] x15: 67268ad7 x14: 5c6254ac
[   12.876851] x13: f1e100cb x12: 0030
[   12.882151] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
[   12.887452] x9 : ff656d6e626d686f x8 : 7f7f7f7f7f7f7f7f
[   12.892752] x7 : 808f310da108 x6 : 
[   12.898052] x5 : 0003 x4 : 808f3107a800
[   12.903352] x3 : 0030006c x2 : 0014
[   12.908652] x1 : 2000 x0 : 2030006c
[   12.913952]
[   12.915431] Process swapper/0 (pid: 1, stack limit = 0x808f346b4020)
[   12.922118] Stack: (0x808f346b7bf0 to 0x808f346b8000)
[   12.927850] 7be0:   808f346b7c30 
08506e2c
[   12.935665] 7c00: 09185000 006c 808f31076000 
808f346b7d14
[   12.943481] 7c20:  08309488 808f346b7c90 
085089f4
[   12.951296] 7c40: 0004 808f310d4000  
808f346b7d14
[   12.959111] 7c60: 0068 08dc3078 08d604c8 
085089d8
[   12.966927] 7c80: 0004 0004080b 808f346b7cd0 
08513d28
[   12.974742] 7ca0: 09185000 ffe7 0001 
808f310d4000
[   12.982557] 7cc0: 092ae000 808f310d4000 808f346b7d20 
085142d4
[   12.990372] 7ce0: 808f310d4000 808f310d4000 09214000 
808f310d40b0
[   12.998188] 7d00: 092ae000 808f310d40b0 092ae000 
0004080b
[   13.006003] 7d20: 808f346b7d40 08518754  
808f310d4000
[   13.013818] 7d40: 808f346b7d80 08d9a974  
808f310d4000
[   13.021634] 7d60: 08d9a93c  092ae000 
0004080b
[   13.029449] 7d80: 808f346b7da0 08083b4c 09185000 
808f346b4000
[   13.037264] 7da0: 808f346b7e30 08d60dfc 00f5 
09185000
[   13.045079] 7dc0: 092ae000 0007 092ae000 
08dc3078
[   13.052895] 7de0: 08d604c8 08d51600 08dc3070 
08e2b720
[   13.060710] 7e00: 091a68d8 08c09678  
00070007
[   13.068526] 7e20:  0004080b 808f346b7ea0 
08980d90
[   13.076342] 7e40: 08980d78   

[   13.084157] 7e60:    

[   13.091972] 7e80:    
0004080b
[   13.099788] 7ea0:  08083690 08980d78 

[   13.107603] 7ec0:    

[   13.115418] 7ee0:    

[   13.123233] 7f00:    

[   13.131048] 7f20:    

[   13.138864] 7f40:    

[   13.146679] 7f60:    

[   13.154494] 7f80:    

[   13.162309] 7fa0:    

[   13.170125] 7fc0:  0005  

[   13.177940] 7fe0:    

[   13.185755] Call trace:
[   13.188190] Ex

Re: [PATCH] ARM: dts: imx53: use PMIC's TSI pins in adc mode

2018-02-01 Thread Shawn Guo
On Mon, Jan 15, 2018 at 03:28:20PM +0100, Sebastian Reichel wrote:
> PPD uses the PMIC's TSI pins in general purpose ADC mode.
> 
> Signed-off-by: Sebastian Reichel 

s/imx53/imx53-ppd

> ---
>  arch/arm/boot/dts/imx53-ppd.dts | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/imx53-ppd.dts b/arch/arm/boot/dts/imx53-ppd.dts
> index 123297da43a7..80224fa995f9 100644
> --- a/arch/arm/boot/dts/imx53-ppd.dts
> +++ b/arch/arm/boot/dts/imx53-ppd.dts
> @@ -132,6 +132,14 @@
>   enable-active-high;
>   };
>  
> + reg_tsiref: tsiref {

A better node name should be regulator-tsiref.

> + compatible = "regulator-fixed";
> + regulator-name = "tsiref";
> + regulator-min-microvolt = <250>;
> + regulator-max-microvolt = <250>;
> + regulator-always-on;
> + };
> +
>   pwm_bl: backlight {
>   compatible = "pwm-backlight";
>   pwms = <&pwm2 0 5>;
> @@ -295,6 +303,9 @@
>   interrupts = <12 0x8>;
>   spi-max-frequency = <100>;
>  

This new line can be dropped now.

I fixed up all these, and applied the patch.

Shawn 

> + dlg,tsi-as-adc;
> + tsiref-supply = <®_tsiref>;
> +
>   regulators {
>   buck1_reg: buck1 {
>   regulator-name = "BUCKCORE";
> -- 
> 2.15.1
> 


[PATCH 2/6] nvme-pci: fix the freeze and quiesce for shutdown and reset case

2018-02-01 Thread Jianchao Wang
Currently, request queue will be frozen and quiesced for both reset
and shutdown case. This will trigger ioq requests in RECONNECTING
state which should be avoided to prepare for following patch.
Just freeze request queue for shutdown case and drain all the resudual
entered requests after controller has been shutdown.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 36 
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 00cffed..a7fa397 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2172,21 +2172,23 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
if (pci_is_enabled(pdev)) {
u32 csts = readl(dev->bar + NVME_REG_CSTS);
 
-   if (dev->ctrl.state == NVME_CTRL_LIVE ||
-   dev->ctrl.state == NVME_CTRL_RESETTING)
-   nvme_start_freeze(&dev->ctrl);
dead = !!((csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY) ||
pdev->error_state  != pci_channel_io_normal);
}
 
-   /*
-* Give the controller a chance to complete all entered requests if
-* doing a safe shutdown.
-*/
-   if (!dead) {
-   if (shutdown)
+   /* Just freeze the queue for shutdown case */
+   if (shutdown) {
+   if (dev->ctrl.state == NVME_CTRL_LIVE ||
+   dev->ctrl.state == NVME_CTRL_RESETTING)
+   nvme_start_freeze(&dev->ctrl);
+   /*
+* Give the controller a chance to complete all
+* entered requests if doing a safe shutdown.
+*/
+   if (!dead)
nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
}
+
nvme_stop_queues(&dev->ctrl);
 
if (!dead) {
@@ -2210,12 +2212,15 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
blk_mq_tagset_busy_iter(&dev->admin_tagset, nvme_cancel_request, 
&dev->ctrl);
 
/*
-* The driver will not be starting up queues again if shutting down so
-* must flush all entered requests to their failed completion to avoid
-* deadlocking blk-mq hot-cpu notifier.
+* For shutdown case, controller will not be setup again soon. If any
+* residual requests here, the controller must have go wrong. Drain and
+* fail all the residual entered IO requests.
 */
-   if (shutdown)
+   if (shutdown) {
nvme_start_queues(&dev->ctrl);
+   nvme_wait_freeze(&dev->ctrl);
+   nvme_stop_queues(&dev->ctrl);
+   }
mutex_unlock(&dev->shutdown_lock);
 }
 
@@ -2349,12 +2354,11 @@ static void nvme_reset_work(struct work_struct *work)
nvme_remove_namespaces(&dev->ctrl);
new_state = NVME_CTRL_ADMIN_ONLY;
} else {
-   nvme_start_queues(&dev->ctrl);
-   nvme_wait_freeze(&dev->ctrl);
/* hit this only when allocate tagset fails */
if (nvme_dev_add(dev))
new_state = NVME_CTRL_ADMIN_ONLY;
-   nvme_unfreeze(&dev->ctrl);
+   if (was_suspend)
+   nvme_unfreeze(&dev->ctrl);
}
 
/*
-- 
2.7.4



Re: [PATCH] xen: fix frontend driver disconnected from xenbus on removal

2018-02-01 Thread Oleksandr Andrushchenko

On 02/01/2018 11:09 PM, Boris Ostrovsky wrote:

On 02/01/2018 03:24 PM, Oleksandr Andrushchenko wrote:


On 02/01/2018 10:08 PM, Boris Ostrovsky wrote:

On 02/01/2018 03:57 AM, Oleksandr Andrushchenko wrote:

From: Oleksandr Andrushchenko 

Current xenbus frontend driver removal flow first disconnects
the driver from xenbus and then calls driver's remove callback.
This makes it impossible for the driver to listen to backend's
state changes and synchronize the removal procedure.

Fix this by removing other end XenBus watches after the
driver's remove callback is called.

Signed-off-by: Oleksandr Andrushchenko

---
   drivers/xen/xenbus/xenbus_probe.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe.c
b/drivers/xen/xenbus/xenbus_probe.c
index 74888cacd0b0..9c63cd3f416b 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -258,11 +258,11 @@ int xenbus_dev_remove(struct device *_dev)
 DPRINTK("%s", dev->nodename);
   -free_otherend_watch(dev);
-
   if (drv->remove)
   drv->remove(dev);

Is it possible for the watch to fire here?

Indeed. Yes, It is possible, so we have to somehow protect the removed
driver from being called, e.g. the driver cleans up in its .remove,
but watch may still trigger .otherend_changed callback.
Is this what you mean?

(-David who is not at Citrix anymore)

Exactly.

That's why otherend cleanup is split into free_otherend_watch() and
free_otherend_details().

Understood, thank you
Confusion came because of the patch [1]: in .remove we wait
for the backend to change its states in .otherend_changed
callback and wake us, but I am not sure how those state changes
may occur if during .remove the driver has already watches
freed. So, this is why I tried to play around with
free_otherend_watch()...



If so, do you have something neat on your mind how to solve this?

Not necessarily "neat" but perhaps you can use
xenbus_read_otherend_details() in both front and back ends. After all,
IIUIC you are doing something synchronously so you don't really need a
watch.

Yes, I will implement a dedicated flow in the .remove
instead of relying on .otherend_changed

-boris


-boris


   +free_otherend_watch(dev);
+
   free_otherend_details(dev);
 xenbus_switch_state(dev, XenbusStateClosed);

Thank you,
Oleksandr

Thank you,
Oleksandr

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/net/xen-netfront.c?id=5b5971df3bc2775107ddad164018a8a8db633b81


Is the hisilicon tree maintained ?

2018-02-01 Thread Daniel Lezcano

Hi Wei Xu,

I found in the MAINTAINERS file the hisilicon tree is at:

https://github.com/hisilicon/linux-hisi

But, (except I missed it) I didn't find any update since Nov, 2017.

Is that tree maintained ?

Thanks?

  -- Daniel


-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



[PATCH 3/6] blk-mq: make blk_mq_rq_update_aborted_gstate a external interface

2018-02-01 Thread Jianchao Wang
No functional change, just make blk_mq_rq_update_aborted_gstate a
external interface.

Signed-off-by: Jianchao Wang 
---
 block/blk-mq.c | 3 ++-
 include/linux/blk-mq.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 01f271d..a027ca2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -581,7 +581,7 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int 
*srcu_idx)
*srcu_idx = srcu_read_lock(hctx->srcu);
 }
 
-static void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate)
+void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate)
 {
unsigned long flags;
 
@@ -597,6 +597,7 @@ static void blk_mq_rq_update_aborted_gstate(struct request 
*rq, u64 gstate)
u64_stats_update_end(&rq->aborted_gstate_sync);
local_irq_restore(flags);
 }
+EXPORT_SYMBOL(blk_mq_rq_update_aborted_gstate);
 
 static u64 blk_mq_rq_aborted_gstate(struct request *rq)
 {
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8efcf49..ad54024 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -257,6 +257,7 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool 
at_head,
 void blk_mq_kick_requeue_list(struct request_queue *q);
 void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long 
msecs);
 void blk_mq_complete_request(struct request *rq);
+void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate);
 
 bool blk_mq_queue_stopped(struct request_queue *q);
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx);
-- 
2.7.4



RE: [PATCH] mm/swap: add function get_total_swap_pages to expose total_swap_pages

2018-02-01 Thread He, Roger
Use the limit as total ram*1/2 seems work very well. 
No OOM although swap disk reaches full at peak for piglit test.

But for this approach, David noticed that has an obvious defect. 
For example,  if the platform has 32G system memory, 8G swap disk.
1/2 * ram = 16G which is bigger than swap disk, so no swap for TTM is allowed 
at all.
For now we work out an improved version based on get_nr_swap_pages().
Going to send out later.

Thanks
Roger(Hongbo.He)
-Original Message-
From: He, Roger 
Sent: Thursday, February 01, 2018 4:03 PM
To: Koenig, Christian ; Zhou, David(ChunMing) 
; dri-de...@lists.freedesktop.org
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org; 'He, Roger' 

Subject: RE: [PATCH] mm/swap: add function get_total_swap_pages to expose 
total_swap_pages

Just now, I tried with fixed limit.  But not work always.
For example: set the limit as 4GB on my platform with 8GB system memory, it can 
pass.
But when run with platform with 16GB system memory, it failed since OOM.

And I guess it also depends on app's behavior.
I mean some apps  make OS to use more swap space as well.

Thanks
Roger(Hongbo.He)
-Original Message-
From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On Behalf Of 
He, Roger
Sent: Thursday, February 01, 2018 1:48 PM
To: Koenig, Christian ; Zhou, David(ChunMing) 
; dri-de...@lists.freedesktop.org
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org
Subject: RE: [PATCH] mm/swap: add function get_total_swap_pages to expose 
total_swap_pages

But what we could do is to rely on a fixed limit like the Intel driver 
does and I suggested before.
E.g. don't copy anything into a shmemfile when there is only x MB of 
swap space left.

Here I think we can do it further, let the limit value scaling with total 
system memory.
For example: total system memory * 1/2. 
If that it will match the platform configuration better. 

Roger can you test that approach once more with your fix for the OOM 
issues in the page fault handler?

Sure. Use the limit as total ram*1/2 seems work very well. 
No OOM although swap disk reaches full at peak for piglit test.
I speculate this case happens but no OOM because:

a. run a while, swap disk be used close to 1/2* total size and but not over 1/2 
* total.
b. all subsequent swapped pages stay in system memory until no space there.
 Then the swapped pages in shmem be flushed into swap disk. And probably OS 
also need some swap space.
 For this case, it is easy to get full for swap disk.
c. but because now free swap size < 1/2 * total, so no swap out happen  after 
that. 
And at least 1/4* system memory will left because below check in 
ttm_mem_global_reserve will ensure that.
if (zone->used_mem > limit)
goto out_unlock;

Thanks
Roger(Hongbo.He)
-Original Message-
From: Koenig, Christian
Sent: Wednesday, January 31, 2018 4:13 PM
To: He, Roger ; Zhou, David(ChunMing) ; 
dri-de...@lists.freedesktop.org
Cc: linux...@kvack.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/swap: add function get_total_swap_pages to expose 
total_swap_pages

Yeah, indeed. But what we could do is to rely on a fixed limit like the Intel 
driver does and I suggested before.

E.g. don't copy anything into a shmemfile when there is only x MB of swap space 
left.

Roger can you test that approach once more with your fix for the OOM issues in 
the page fault handler?

Thanks,
Christian.

Am 31.01.2018 um 09:08 schrieb He, Roger:
>   I think this patch isn't need at all. You can directly read 
> total_swap_pages variable in TTM.
>
> Because the variable is not exported by EXPORT_SYMBOL_GPL. So direct using 
> will result in:
> "WARNING: "total_swap_pages" [drivers/gpu/drm/ttm/ttm.ko] undefined!".
>
> Thanks
> Roger(Hongbo.He)
> -Original Message-
> From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On 
> Behalf Of Chunming Zhou
> Sent: Wednesday, January 31, 2018 3:15 PM
> To: He, Roger ; dri-de...@lists.freedesktop.org
> Cc: linux...@kvack.org; linux-kernel@vger.kernel.org; Koenig, 
> Christian 
> Subject: Re: [PATCH] mm/swap: add function get_total_swap_pages to 
> expose total_swap_pages
>
> Hi Roger,
>
> I think this patch isn't need at all. You can directly read total_swap_pages 
> variable in TTM. See the comment:
>
> /* protected with swap_lock. reading in vm_swap_full() doesn't need 
> lock */ long total_swap_pages;
>
> there are many places using it directly, you just couldn't change its value. 
> Reading it doesn't need lock.
>
>
> Regards,
>
> David Zhou
>
>
> On 2018年01月29日 16:29, Roger He wrote:
>> ttm module needs it to determine its internal parameter setting.
>>
>> Signed-off-by: Roger He 
>> ---
>>include/linux/swap.h |  6 ++
>>mm/swapfile.c| 15 +++
>>2 files changed, 21 insertions(+)
>>
>> diff --git a/include/linux/swap.h b/include/linux/swap.h index 
>> c2b8128..708d66f 100644
>> --

[PATCH 1/6] nvme-pci: move clearing host mem behind stopping queues

2018-02-01 Thread Jianchao Wang
Move clearing host mem behind stopping queues. Prepare for
following patch which will grab all the outstanding requests.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 6fe7af0..00cffed 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2186,7 +2186,10 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
if (!dead) {
if (shutdown)
nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
+   }
+   nvme_stop_queues(&dev->ctrl);
 
+   if (!dead) {
/*
 * If the controller is still alive tell it to stop using the
 * host memory buffer.  In theory the shutdown / reset should
@@ -2195,11 +2198,6 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
 */
if (dev->host_mem_descs)
nvme_set_host_mem(dev, 0);
-
-   }
-   nvme_stop_queues(&dev->ctrl);
-
-   if (!dead) {
nvme_disable_io_queues(dev);
nvme_disable_admin_queue(dev, shutdown);
}
-- 
2.7.4



[PATCH 6/6] nvme-pci: suspend queues based on online_queues

2018-02-01 Thread Jianchao Wang
nvme cq irq is freed based on queue_count. When the sq/cq creation
fails, irq will not be setup. free_irq will warn 'Try to free
already-free irq'.

To fix it, we only increase online_queues when adminq/sq/cq are
created and associated irq is setup. Then suspend queues based
on online_queues.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 31 ---
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a838713c..e37f209 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1349,9 +1349,6 @@ static int nvme_suspend_queue(struct nvme_queue *nvmeq)
nvmeq->cq_vector = -1;
spin_unlock_irq(&nvmeq->q_lock);
 
-   if (!nvmeq->qid && nvmeq->dev->ctrl.admin_q)
-   blk_mq_quiesce_queue(nvmeq->dev->ctrl.admin_q);
-
pci_free_irq(to_pci_dev(nvmeq->dev->dev), vector, nvmeq);
 
return 0;
@@ -1495,13 +1492,15 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, 
int qid)
nvme_init_queue(nvmeq, qid);
result = queue_request_irq(nvmeq);
if (result < 0)
-   goto release_sq;
+   goto offline;
 
return result;
 
- release_sq:
+offline:
+   dev->online_queues--;
+release_sq:
adapter_delete_sq(dev, qid);
- release_cq:
+release_cq:
adapter_delete_cq(dev, qid);
return result;
 }
@@ -1641,6 +1640,7 @@ static int nvme_pci_configure_admin_queue(struct nvme_dev 
*dev)
result = queue_request_irq(nvmeq);
if (result) {
nvmeq->cq_vector = -1;
+   dev->online_queues--;
return result;
}
 
@@ -1988,6 +1988,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
result = queue_request_irq(adminq);
if (result) {
adminq->cq_vector = -1;
+   dev->online_queues--;
return result;
}
return nvme_create_io_queues(dev);
@@ -2257,13 +2258,16 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
int i;
bool dead = true;
struct pci_dev *pdev = to_pci_dev(dev->dev);
+   int onlines;
 
mutex_lock(&dev->shutdown_lock);
if (pci_is_enabled(pdev)) {
u32 csts = readl(dev->bar + NVME_REG_CSTS);
 
-   dead = !!((csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY) ||
-   pdev->error_state  != pci_channel_io_normal);
+   dead = !!((csts & NVME_CSTS_CFS) ||
+   !(csts & NVME_CSTS_RDY) ||
+   (pdev->error_state  != pci_channel_io_normal) ||
+   (dev->online_queues == 0));
}
 
/* Just freeze the queue for shutdown case */
@@ -2297,9 +2301,14 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
nvme_disable_io_queues(dev);
nvme_disable_admin_queue(dev, shutdown);
}
-   for (i = dev->ctrl.queue_count - 1; i >= 0; i--)
+
+   onlines = dev->online_queues;
+   for (i = onlines - 1; i >= 0; i--)
nvme_suspend_queue(&dev->queues[i]);
 
+   if (dev->ctrl.admin_q)
+   blk_mq_quiesce_queue(dev->ctrl.admin_q);
+
nvme_pci_disable(dev);
 
blk_mq_tagset_busy_iter(&dev->tagset, nvme_pci_cancel_rq, &dev->ctrl);
@@ -2444,12 +2453,12 @@ static void nvme_reset_work(struct work_struct *work)
 * Keep the controller around but remove all namespaces if we don't have
 * any working I/O queue.
 */
-   if (dev->online_queues < 2) {
+   if (dev->online_queues == 1) {
dev_warn(dev->ctrl.device, "IO queues not created\n");
nvme_kill_queues(&dev->ctrl);
nvme_remove_namespaces(&dev->ctrl);
new_state = NVME_CTRL_ADMIN_ONLY;
-   } else {
+   } else if (dev->online_queues > 1) {
/* hit this only when allocate tagset fails */
if (nvme_dev_add(dev))
new_state = NVME_CTRL_ADMIN_ONLY;
-- 
2.7.4



[PATCH 2/6] nvme-pci: fix the freeze and quiesce for shutdown and reset case

2018-02-01 Thread Jianchao Wang
Currently, request queue will be frozen and quiesced for both reset
and shutdown case. This will trigger ioq requests in RECONNECTING
state which should be avoided to prepare for following patch.
Just freeze request queue for shutdown case and drain all the resudual
entered requests after controller has been shutdown.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 36 
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 00cffed..a7fa397 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2172,21 +2172,23 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
if (pci_is_enabled(pdev)) {
u32 csts = readl(dev->bar + NVME_REG_CSTS);
 
-   if (dev->ctrl.state == NVME_CTRL_LIVE ||
-   dev->ctrl.state == NVME_CTRL_RESETTING)
-   nvme_start_freeze(&dev->ctrl);
dead = !!((csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY) ||
pdev->error_state  != pci_channel_io_normal);
}
 
-   /*
-* Give the controller a chance to complete all entered requests if
-* doing a safe shutdown.
-*/
-   if (!dead) {
-   if (shutdown)
+   /* Just freeze the queue for shutdown case */
+   if (shutdown) {
+   if (dev->ctrl.state == NVME_CTRL_LIVE ||
+   dev->ctrl.state == NVME_CTRL_RESETTING)
+   nvme_start_freeze(&dev->ctrl);
+   /*
+* Give the controller a chance to complete all
+* entered requests if doing a safe shutdown.
+*/
+   if (!dead)
nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
}
+
nvme_stop_queues(&dev->ctrl);
 
if (!dead) {
@@ -2210,12 +2212,15 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool 
shutdown)
blk_mq_tagset_busy_iter(&dev->admin_tagset, nvme_cancel_request, 
&dev->ctrl);
 
/*
-* The driver will not be starting up queues again if shutting down so
-* must flush all entered requests to their failed completion to avoid
-* deadlocking blk-mq hot-cpu notifier.
+* For shutdown case, controller will not be setup again soon. If any
+* residual requests here, the controller must have go wrong. Drain and
+* fail all the residual entered IO requests.
 */
-   if (shutdown)
+   if (shutdown) {
nvme_start_queues(&dev->ctrl);
+   nvme_wait_freeze(&dev->ctrl);
+   nvme_stop_queues(&dev->ctrl);
+   }
mutex_unlock(&dev->shutdown_lock);
 }
 
@@ -2349,12 +2354,11 @@ static void nvme_reset_work(struct work_struct *work)
nvme_remove_namespaces(&dev->ctrl);
new_state = NVME_CTRL_ADMIN_ONLY;
} else {
-   nvme_start_queues(&dev->ctrl);
-   nvme_wait_freeze(&dev->ctrl);
/* hit this only when allocate tagset fails */
if (nvme_dev_add(dev))
new_state = NVME_CTRL_ADMIN_ONLY;
-   nvme_unfreeze(&dev->ctrl);
+   if (was_suspend)
+   nvme_unfreeze(&dev->ctrl);
}
 
/*
-- 
2.7.4



[PATCH] x86/retpoline: check CONFIG_RETPOLINE option when SPECTRE_V2_CMD_AUTO

2018-02-01 Thread Chen Baozi
Currently, if there is no spectre_v2= or nospectre_v2 specified in the boot
parameter, the kernel will automatically choose mitigation by default.
However, when selecting the auto mode, it doesn't check whether the
retpoline has been built in the kernel. Thus, if someone built a kernel
without CONFIG_RETPOLINE and booted the system without specifying any
spectre_v2 kernel parameters, the kernel would still report that it has
enabled a minimal retpoline mitigation which is not the case. This patch
adds the checking of CONFIG_RETPOLINE option under the 'auto' mode to fix
it.

Signed-off-by: Chen Baozi 
---
 arch/x86/kernel/cpu/bugs.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 390b3dc3d438..70b7d17426eb 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -193,7 +193,9 @@ static void __init spectre_v2_select_mitigation(void)
case SPECTRE_V2_CMD_FORCE:
/* FALLTRHU */
case SPECTRE_V2_CMD_AUTO:
-   goto retpoline_auto;
+   if (IS_ENABLED(CONFIG_RETPOLINE))
+   goto retpoline_auto;
+   break;
 
case SPECTRE_V2_CMD_RETPOLINE_AMD:
if (IS_ENABLED(CONFIG_RETPOLINE))
-- 
2.13.5 (Apple Git-94)



[PATCH 4/6] nvme-pci: break up nvme_timeout and nvme_dev_disable

2018-02-01 Thread Jianchao Wang
Currently, the complicated relationship between nvme_dev_disable
and nvme_timeout has become a devil that will introduce many
circular pattern which may trigger deadlock or IO hang. Let's
enumerate the tangles between them:
 - nvme_timeout has to invoke nvme_dev_disable to stop the
   controller doing DMA access before free the request.
 - nvme_dev_disable has to depend on nvme_timeout to complete
   adminq requests to set HMB or delete sq/cq when the controller
   has no response.
 - nvme_dev_disable will race with nvme_timeout when cancels the
   outstanding requests.

To break up them, let's first look at what's kind of requests
nvme_timeout has to handle.

RESETTING  previous adminq/IOq request
or shutdownadminq requests from nvme_dev_disable

RECONNECTING   adminq requests from nvme_reset_work

nvme_timeout has to invoke nvme_dev_disable first before complete
all the expired request above. We avoid this as following.

For the previous adminq/IOq request:
use blk_abort_request to force all the outstanding requests expired
in nvme_dev_disable. In nvme_timeout, set NVME_REQ_CANCELLED and
return BLK_EH_NOT_HANDLED. Then the request will not be completed and
freed. We needn't invoke nvme_dev_disable any more.

blk_abort_request is safe when race with irq completion path.
we have been able to grab all the outstanding requests. This could
eliminate the race between nvme_timeout and nvme_dev_disable.

We use NVME_REQ_CANCELLED to identify them. After the controller is
totally disabled/shutdown, we invoke blk_mq_rq_update_aborted_gstate
to clear requests and invoke blk_mq_complete_request to complete them.

In addition, to identify the previous adminq/IOq request and adminq
requests from nvme_dev_disable, we introduce NVME_PCI_OUTSTANDING_GRABBING
and NVME_PCI_OUTSTANDING_GRABBED to let nvme_timeout be able to
distinguish them.

For the adminq requests from nvme_dev_disable/nvme_reset_work:
invoke nvme_disable_ctrl directly, then set NVME_REQ_CANCELLED and
return BLK_EH_HANDLED. nvme_dev_disable/nvme_reset_work will
see the error.

With this patch, we could avoid nvme_dev_disable to be invoked
by nvme_timeout and eliminate the race between nvme_timeout and
nvme_dev_disable on outstanding requests.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 146 
 1 file changed, 123 insertions(+), 23 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a7fa397..5b192b0 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -70,6 +70,8 @@ struct nvme_queue;
 
 static void nvme_process_cq(struct nvme_queue *nvmeq);
 static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
+#define NVME_PCI_OUTSTANDING_GRABBING 1
+#define NVME_PCI_OUTSTANDING_GRABBED 2
 
 /*
  * Represents an NVM Express device.  Each nvme_dev is a PCI function.
@@ -80,6 +82,7 @@ struct nvme_dev {
struct blk_mq_tag_set admin_tagset;
u32 __iomem *dbs;
struct device *dev;
+   int grab_flag;
struct dma_pool *prp_page_pool;
struct dma_pool *prp_small_pool;
unsigned online_queues;
@@ -1130,6 +1133,23 @@ static void abort_endio(struct request *req, 
blk_status_t error)
blk_mq_free_request(req);
 }
 
+static void nvme_pci_disable_ctrl_directly(struct nvme_dev *dev)
+{
+   struct pci_dev *pdev = to_pci_dev(dev->dev);
+   u32 csts;
+   bool dead;
+
+   if (!pci_is_enabled(pdev))
+   return;
+
+   csts = readl(dev->bar + NVME_REG_CSTS);
+   dead = !!((csts & NVME_CSTS_CFS) ||
+   !(csts & NVME_CSTS_RDY) ||
+   pdev->error_state  != pci_channel_io_normal);
+   if (!dead)
+   nvme_disable_ctrl(&dev->ctrl, dev->ctrl.cap);
+}
+
 static bool nvme_should_reset(struct nvme_dev *dev, u32 csts)
 {
 
@@ -1191,12 +1211,13 @@ static enum blk_eh_timer_return nvme_timeout(struct 
request *req, bool reserved)
 
/*
 * Reset immediately if the controller is failed
+* nvme_dev_disable will take over the expired requests.
 */
if (nvme_should_reset(dev, csts)) {
+   nvme_req(req)->flags |= NVME_REQ_CANCELLED;
nvme_warn_reset(dev, csts);
-   nvme_dev_disable(dev, false);
nvme_reset_ctrl(&dev->ctrl);
-   return BLK_EH_HANDLED;
+   return BLK_EH_NOT_HANDLED;
}
 
/*
@@ -1210,38 +1231,51 @@ static enum blk_eh_timer_return nvme_timeout(struct 
request *req, bool reserved)
}
 
/*
-* Shutdown immediately if controller times out while starting. The
-* reset work will see the pci device disabled when it gets the forced
-* cancellation error. All outstanding requests are completed on
-* shutdown, so we return BLK_EH_HANDLED.
+* The previous outstanding requests on adminq and ioq have been
+* grabbed or drained for RECON

[no subject]

2018-02-01 Thread Jianchao Wang
Hi Christoph, Keith and Sagi

Please consider and comment on the following patchset.
That's really appreciated.

There is a complicated relationship between nvme_timeout and nvme_dev_disable.
 - nvme_timeout has to invoke nvme_dev_disable to stop the
   controller doing DMA access before free the request.
 - nvme_dev_disable has to depend on nvme_timeout to complete
   adminq requests to set HMB or delete sq/cq when the controller
   has no response.
 - nvme_dev_disable will race with nvme_timeout when cancels the
   outstanding requests.
We have found some issues introduced by them, please refer the following link

http://lists.infradead.org/pipermail/linux-nvme/2018-January/015053.html 
http://lists.infradead.org/pipermail/linux-nvme/2018-January/015276.html
http://lists.infradead.org/pipermail/linux-nvme/2018-January/015328.html
Even we cannot ensure there is no other issue.

The best way to fix them is to break up the relationship between them.
With this patch, we could avoid nvme_dev_disable to be invoked
by nvme_timeout and eliminate the race between nvme_timeout and
nvme_dev_disable on outstanding requests.


There are 6 patches:

1st ~ 3th patches does some preparation for the 4th one.
4th is to avoid nvme_dev_disable to be invoked by nvme_timeout, and implement
the synchronization between them. More details, please refer to the comment of
this patch.
5th fixes a bug after 4th patch is introduced. It let nvme_delete_io_queues can
only be wakeup by completion path.
6th fixes a bug found when test, it is not related with 4th patch.

This patchset was tested under debug patch for some days.
And some bugfix have been done.
The debug patch and other patches are available in following it branch:
https://github.com/jianchwa/linux-blcok.git nvme_fixes_test

Jianchao Wang (6)
0001-nvme-pci-move-clearing-host-mem-behind-stopping-queu.patch
0002-nvme-pci-fix-the-freeze-and-quiesce-for-shutdown-and.patch
0003-blk-mq-make-blk_mq_rq_update_aborted_gstate-a-extern.patch
0004-nvme-pci-break-up-nvme_timeout-and-nvme_dev_disable.patch
0005-nvme-pci-discard-wait-timeout-when-delete-cq-sq.patch
0006-nvme-pci-suspend-queues-based-on-online_queues.patch

diff stat following:
 block/blk-mq.c  |   3 +-
 drivers/nvme/host/pci.c | 225 
++-
 include/linux/blk-mq.h  |   1 +
 3 files changed, 169 insertions(+), 60 deletions(-)

Thanks
Jianchao



[PATCH 5/6] nvme-pci: discard wait timeout when delete cq/sq

2018-02-01 Thread Jianchao Wang
Currently, nvme_disable_io_queues could be wakeup by both request
completion and wait timeout path. This is unnecessary and could
introduce race between nvme_dev_disable and request timeout path.
When delete cq/sq command expires, the nvme_disable_io_queues will
also be wakeup and return to nvme_dev_disable, then handle the
outstanding requests. This will race with the request timeout path.

To fix it, just use wait_for_completion instead of the timeout one.
The request timeout path will wakeup it.

Signed-off-by: Jianchao Wang 
---
 drivers/nvme/host/pci.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 5b192b0..a838713c 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2048,7 +2048,6 @@ static int nvme_delete_queue(struct nvme_queue *nvmeq, u8 
opcode)
 static void nvme_disable_io_queues(struct nvme_dev *dev)
 {
int pass, queues = dev->online_queues - 1;
-   unsigned long timeout;
u8 opcode = nvme_admin_delete_sq;
 
for (pass = 0; pass < 2; pass++) {
@@ -2056,15 +2055,12 @@ static void nvme_disable_io_queues(struct nvme_dev *dev)
 
reinit_completion(&dev->ioq_wait);
  retry:
-   timeout = ADMIN_TIMEOUT;
for (; i > 0; i--, sent++)
if (nvme_delete_queue(&dev->queues[i], opcode))
break;
 
while (sent--) {
-   timeout = 
wait_for_completion_io_timeout(&dev->ioq_wait, timeout);
-   if (timeout == 0)
-   return;
+   wait_for_completion(&dev->ioq_wait);
if (i)
goto retry;
}
-- 
2.7.4



Re: [PATCH] ARM: dts: imx53: Add touchscreen reset line

2018-02-01 Thread Shawn Guo
On Mon, Jan 15, 2018 at 03:24:52PM +0100, Sebastian Reichel wrote:
> From: Martyn Welch 
> 
> Utilise new support in Atmel MaxTouch driver to drive the
> touchscreen controllers reset line correctly.
> 
> Signed-off-by: Martyn Welch 
> Signed-off-by: Sebastian Reichel 

s/imx53/imx53-ppd in subject.

I fixed it up and applied the patch.

Shawn


[patch v1 4/4] platform/x86: mlx-platform: Add support for new Mellanox systems

2018-02-01 Thread Vadim Pasternak
Add support for the next new Mellanox system types: msn274x, msn201x,
qmb7, sn34, sn37. The current members of these types are:
- MSN2740 (32x100GbE Ethernet switch with cost reduction);
- MSN2010 (18x10GbE plus 4x4x25GbE);
- QMB700 (40x200GbE InfiniBand switch);
- SN3700 (32x200GbE and 16x400GbE Ethernet switch);
- SN3410 (6x400GbE plus 48x50GbE Ethernet switch).

Signed-off-by: Vadim Pasternak 
---
 drivers/platform/x86/mlx-platform.c | 298 
 1 file changed, 298 insertions(+)

diff --git a/drivers/platform/x86/mlx-platform.c 
b/drivers/platform/x86/mlx-platform.c
index 4d8078d..94b0bfc 100644
--- a/drivers/platform/x86/mlx-platform.c
+++ b/drivers/platform/x86/mlx-platform.c
@@ -83,6 +83,7 @@
 #define MLXPLAT_CPLD_PSU_MASK  GENMASK(1, 0)
 #define MLXPLAT_CPLD_PWR_MASK  GENMASK(1, 0)
 #define MLXPLAT_CPLD_FAN_MASK  GENMASK(3, 0)
+#define MLXPLAT_CPLD_FAN_NG_MASK   GENMASK(5, 0)
 
 /* Start channel numbers */
 #define MLXPLAT_CPLD_CH1   2
@@ -94,6 +95,7 @@
 /* Hotplug devices adapter numbers */
 #define MLXPLAT_CPLD_NR_NONE   -1
 #define MLXPLAT_CPLD_PSU_DEFAULT_NR10
+#define MLXPLAT_CPLD_PSU_MSN_NR4
 #define MLXPLAT_CPLD_FAN1_DEFAULT_NR   11
 #define MLXPLAT_CPLD_FAN2_DEFAULT_NR   12
 #define MLXPLAT_CPLD_FAN3_DEFAULT_NR   13
@@ -335,6 +337,225 @@ struct mlxreg_core_hotplug_platform_data 
mlxplat_mlxcpld_msn21xx_data = {
.mask_low = MLXPLAT_CPLD_LOW_AGGR_MASK_LOW,
 };
 
+/* Platform hotplug MSN201x system family data */
+static struct mlxreg_core_data mlxplat_mlxcpld_msn201x_pwr_items_data[] = {
+   {
+   .label = "pwr1",
+   .reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
+   .mask = BIT(0),
+   .hpdev.nr = MLXPLAT_CPLD_NR_NONE,
+   },
+   {
+   .label = "pwr2",
+   .reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
+   .mask = BIT(1),
+   .hpdev.nr = MLXPLAT_CPLD_NR_NONE,
+   },
+};
+
+static struct mlxreg_core_item mlxplat_mlxcpld_msn201x_items[] = {
+   {
+   .data = mlxplat_mlxcpld_msn201x_pwr_items_data,
+   .aggr_mask = MLXPLAT_CPLD_AGGR_PWR_MASK_DEF,
+   .reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
+   .mask = MLXPLAT_CPLD_PWR_MASK,
+   .count = ARRAY_SIZE(mlxplat_mlxcpld_msn201x_pwr_items_data),
+   .inversed = 0,
+   .health = false,
+   },
+};
+
+static
+struct mlxreg_core_hotplug_platform_data mlxplat_mlxcpld_msn201x_data = {
+   .items = mlxplat_mlxcpld_msn21xx_items,
+   .counter = ARRAY_SIZE(mlxplat_mlxcpld_msn201x_items),
+   .cell = MLXPLAT_CPLD_LPC_REG_AGGR_OFFSET,
+   .mask = MLXPLAT_CPLD_AGGR_MASK_DEF,
+   .cell_low = MLXPLAT_CPLD_LPC_REG_AGGRLO_OFFSET,
+   .mask_low = MLXPLAT_CPLD_LOW_AGGR_MASK_LOW,
+};
+
+/* Platform hotplug next generation system family data */
+static struct mlxreg_core_data mlxplat_mlxcpld_default_ng_psu_items_data[] = {
+   {
+   .label = "psu1",
+   .reg = MLXPLAT_CPLD_LPC_REG_PSU_OFFSET,
+   .mask = BIT(0),
+   .hpdev.brdinfo = &mlxplat_mlxcpld_psu[0],
+   .hpdev.nr = MLXPLAT_CPLD_PSU_MSN_NR,
+   },
+   {
+   .label = "psu2",
+   .reg = MLXPLAT_CPLD_LPC_REG_PSU_OFFSET,
+   .mask = BIT(1),
+   .hpdev.brdinfo = &mlxplat_mlxcpld_psu[1],
+   .hpdev.nr = MLXPLAT_CPLD_PSU_MSN_NR,
+   },
+};
+
+static struct mlxreg_core_data mlxplat_mlxcpld_default_ng_pwr_items_data[] = {
+   {
+   .label = "pwr1",
+   .reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
+   .mask = BIT(0),
+   .hpdev.brdinfo = &mlxplat_mlxcpld_pwr[0],
+   .hpdev.nr = MLXPLAT_CPLD_PSU_MSN_NR,
+   },
+   {
+   .label = "pwr2",
+   .reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
+   .mask = BIT(1),
+   .hpdev.brdinfo = &mlxplat_mlxcpld_pwr[1],
+   .hpdev.nr = MLXPLAT_CPLD_PSU_MSN_NR,
+   },
+};
+
+static struct mlxreg_core_data mlxplat_mlxcpld_default_ng_fan_items_data[] = {
+   {
+   .label = "fan1",
+   .reg = MLXPLAT_CPLD_LPC_REG_FAN_OFFSET,
+   .mask = BIT(0),
+   .hpdev.nr = MLXPLAT_CPLD_NR_NONE,
+   },
+   {
+   .label = "fan2",
+   .reg = MLXPLAT_CPLD_LPC_REG_FAN_OFFSET,
+   .mask = BIT(1),
+   .hpdev.nr = MLXPLAT_CPLD_NR_NONE,
+   },
+   {
+   .label = "fan3",
+   .reg = MLXPLAT_CPLD_LPC_REG_FAN_OFFSET,
+   .mask = BIT(2),
+   .hpdev.nr = MLXPLAT_CPLD_NR_NONE,
+   },
+   {
+   .label = "fan4",
+   .reg = MLXPLAT_CPLD_LPC_REG_FAN_OFFSET,
+   .mask = BIT(3),
+   

[patch v1 3/4] platform/x86: mlx-platform: Fix power cable setting for systems from msn21xx family

2018-02-01 Thread Vadim Pasternak
Add dedicated structure with power cable setting for Mellanox system from
family msn21xx. These systems do not have physical device for power
unit controller. So, in case power cable is inserted or removed, relevant
interrupt signal is to be handled, status will be updated, but no device
is to be associated with this signal.

Add definition for interrupt low aggregation signal. On system from
msn21xx family, low aggregation mask should be removed in order to allow
signal hit CPU.

Fixes: 6613d18e9038 ("platform/x86: mlx-platform: Move module from arch/x86")
Signed-off-by: Vadim Pasternak 
---
 drivers/platform/x86/mlx-platform.c | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/platform/x86/mlx-platform.c 
b/drivers/platform/x86/mlx-platform.c
index 177b40a..4d8078d 100644
--- a/drivers/platform/x86/mlx-platform.c
+++ b/drivers/platform/x86/mlx-platform.c
@@ -77,6 +77,8 @@
 #define MLXPLAT_CPLD_AGGR_FAN_MASK_DEF 0x40
 #define MLXPLAT_CPLD_AGGR_MASK_DEF (MLXPLAT_CPLD_AGGR_PSU_MASK_DEF | \
 MLXPLAT_CPLD_AGGR_FAN_MASK_DEF)
+#define MLXPLAT_CPLD_AGGR_MASK_NG_DEF  0x04
+#define MLXPLAT_CPLD_LOW_AGGR_MASK_LOW 0xc0
 #define MLXPLAT_CPLD_AGGR_MASK_MSN21XX 0x04
 #define MLXPLAT_CPLD_PSU_MASK  GENMASK(1, 0)
 #define MLXPLAT_CPLD_PWR_MASK  GENMASK(1, 0)
@@ -295,14 +297,29 @@ struct mlxreg_core_hotplug_platform_data 
mlxplat_mlxcpld_default_data = {
.mask = MLXPLAT_CPLD_AGGR_MASK_DEF,
 };
 
+static struct mlxreg_core_data mlxplat_mlxcpld_msn21xx_pwr_items_data[] = {
+   {
+   .label = "pwr1",
+   .reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
+   .mask = BIT(0),
+   .hpdev.nr = MLXPLAT_CPLD_NR_NONE,
+   },
+   {
+   .label = "pwr2",
+   .reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
+   .mask = BIT(1),
+   .hpdev.nr = MLXPLAT_CPLD_NR_NONE,
+   },
+};
+
 /* Platform hotplug MSN21xx system family data */
 static struct mlxreg_core_item mlxplat_mlxcpld_msn21xx_items[] = {
{
-   .data = mlxplat_mlxcpld_default_pwr_items_data,
+   .data = mlxplat_mlxcpld_msn21xx_pwr_items_data,
.aggr_mask = MLXPLAT_CPLD_AGGR_PWR_MASK_DEF,
.reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
.mask = MLXPLAT_CPLD_PWR_MASK,
-   .count = ARRAY_SIZE(mlxplat_mlxcpld_pwr),
+   .count = ARRAY_SIZE(mlxplat_mlxcpld_msn21xx_pwr_items_data),
.inversed = 0,
.health = false,
},
@@ -314,6 +331,8 @@ struct mlxreg_core_hotplug_platform_data 
mlxplat_mlxcpld_msn21xx_data = {
.counter = ARRAY_SIZE(mlxplat_mlxcpld_msn21xx_items),
.cell = MLXPLAT_CPLD_LPC_REG_AGGR_OFFSET,
.mask = MLXPLAT_CPLD_AGGR_MASK_DEF,
+   .cell_low = MLXPLAT_CPLD_LPC_REG_AGGRLO_OFFSET,
+   .mask_low = MLXPLAT_CPLD_LOW_AGGR_MASK_LOW,
 };
 
 static bool mlxplat_mlxcpld_writeable_reg(struct device *dev, unsigned int reg)
-- 
2.1.4



[patch v1 2/4] platform/x86: mlx-platform: Add define for the negative bus

2018-02-01 Thread Vadim Pasternak
Add define for the negative bus Id in order to use it in case no hotplug
device is associated with hotplug interrupt signal. In this case signal
will be handled by mlxreg-hotplug driver, but any device will not be
associated with this signal.

Signed-off-by: Vadim Pasternak 
---
 drivers/platform/x86/mlx-platform.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/platform/x86/mlx-platform.c 
b/drivers/platform/x86/mlx-platform.c
index a1ae93d..177b40a 100644
--- a/drivers/platform/x86/mlx-platform.c
+++ b/drivers/platform/x86/mlx-platform.c
@@ -90,6 +90,7 @@
 #define MLXPLAT_CPLD_LPC_MUX_DEVS  2
 
 /* Hotplug devices adapter numbers */
+#define MLXPLAT_CPLD_NR_NONE   -1
 #define MLXPLAT_CPLD_PSU_DEFAULT_NR10
 #define MLXPLAT_CPLD_FAN1_DEFAULT_NR   11
 #define MLXPLAT_CPLD_FAN2_DEFAULT_NR   12
-- 
2.1.4



[patch v1 0/4] mlx-platform: Add support for new Mellanox systems, code improvement, fixes for msn21xx system

2018-02-01 Thread Vadim Pasternak
The patchset:
- adds defines for bus numbers, used for system topology description;
- fixes definition for power cables for system family msn21xx;
- introduces support for new Mellanox systems;

Vadim Pasternak (4):
  platform/x86: mlx-platform: Use defines for bus assignment
  platform/x86: mlx-platform: Add define for the negative bus
  platform/x86: mlx-platform: Fix power cable setting for systems from
msn21xx family
  platform/x86: mlx-platform: Add support for new Mellanox systems

 drivers/platform/x86/mlx-platform.c | 345 ++--
 1 file changed, 335 insertions(+), 10 deletions(-)

-- 
2.1.4



[patch v1 1/4] platform/x86: mlx-platform: Use defines for bus assignment

2018-02-01 Thread Vadim Pasternak
Add defines the bus Ids, used for hotplug devices topology in order to
improve code readability. Defines added for FAN and power units.

Signed-off-by: Vadim Pasternak 
---
 drivers/platform/x86/mlx-platform.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/platform/x86/mlx-platform.c 
b/drivers/platform/x86/mlx-platform.c
index dfecba4..a1ae93d 100644
--- a/drivers/platform/x86/mlx-platform.c
+++ b/drivers/platform/x86/mlx-platform.c
@@ -89,6 +89,13 @@
 /* Number of LPC attached MUX platform devices */
 #define MLXPLAT_CPLD_LPC_MUX_DEVS  2
 
+/* Hotplug devices adapter numbers */
+#define MLXPLAT_CPLD_PSU_DEFAULT_NR10
+#define MLXPLAT_CPLD_FAN1_DEFAULT_NR   11
+#define MLXPLAT_CPLD_FAN2_DEFAULT_NR   12
+#define MLXPLAT_CPLD_FAN3_DEFAULT_NR   13
+#define MLXPLAT_CPLD_FAN4_DEFAULT_NR   14
+
 /* mlxplat_priv - platform private data
  * @pdev_i2c - i2c controller platform device
  * @pdev_mux - array of mux platform devices
@@ -190,14 +197,14 @@ static struct mlxreg_core_data 
mlxplat_mlxcpld_default_psu_items_data[] = {
.reg = MLXPLAT_CPLD_LPC_REG_PSU_OFFSET,
.mask = BIT(0),
.hpdev.brdinfo = &mlxplat_mlxcpld_psu[0],
-   .hpdev.nr = 10,
+   .hpdev.nr = MLXPLAT_CPLD_PSU_DEFAULT_NR,
},
{
.label = "psu2",
.reg = MLXPLAT_CPLD_LPC_REG_PSU_OFFSET,
.mask = BIT(1),
.hpdev.brdinfo = &mlxplat_mlxcpld_psu[1],
-   .hpdev.nr = 10,
+   .hpdev.nr = MLXPLAT_CPLD_PSU_DEFAULT_NR,
},
 };
 
@@ -207,14 +214,14 @@ static struct mlxreg_core_data 
mlxplat_mlxcpld_default_pwr_items_data[] = {
.reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
.mask = BIT(0),
.hpdev.brdinfo = &mlxplat_mlxcpld_pwr[0],
-   .hpdev.nr = 10,
+   .hpdev.nr = MLXPLAT_CPLD_PSU_DEFAULT_NR,
},
{
.label = "pwr2",
.reg = MLXPLAT_CPLD_LPC_REG_PWR_OFFSET,
.mask = BIT(1),
.hpdev.brdinfo = &mlxplat_mlxcpld_pwr[1],
-   .hpdev.nr = 10,
+   .hpdev.nr = MLXPLAT_CPLD_PSU_DEFAULT_NR,
},
 };
 
@@ -224,28 +231,28 @@ static struct mlxreg_core_data 
mlxplat_mlxcpld_default_fan_items_data[] = {
.reg = MLXPLAT_CPLD_LPC_REG_FAN_OFFSET,
.mask = BIT(0),
.hpdev.brdinfo = &mlxplat_mlxcpld_fan[0],
-   .hpdev.nr = 11,
+   .hpdev.nr = MLXPLAT_CPLD_FAN1_DEFAULT_NR,
},
{
.label = "fan2",
.reg = MLXPLAT_CPLD_LPC_REG_FAN_OFFSET,
.mask = BIT(1),
.hpdev.brdinfo = &mlxplat_mlxcpld_fan[1],
-   .hpdev.nr = 12,
+   .hpdev.nr = MLXPLAT_CPLD_FAN2_DEFAULT_NR,
},
{
.label = "fan3",
.reg = MLXPLAT_CPLD_LPC_REG_FAN_OFFSET,
.mask = BIT(2),
.hpdev.brdinfo = &mlxplat_mlxcpld_fan[2],
-   .hpdev.nr = 13,
+   .hpdev.nr = MLXPLAT_CPLD_FAN3_DEFAULT_NR,
},
{
.label = "fan4",
.reg = MLXPLAT_CPLD_LPC_REG_FAN_OFFSET,
.mask = BIT(3),
.hpdev.brdinfo = &mlxplat_mlxcpld_fan[3],
-   .hpdev.nr = 14,
+   .hpdev.nr = MLXPLAT_CPLD_FAN4_DEFAULT_NR,
},
 };
 
-- 
2.1.4



[GIT PULL] arch/microblaze patches for 4.16-rc1

2018-02-01 Thread Michal Simek
Hi,

please pull the following fixes to your tree.

Thanks,
Michal


The following changes since commit a8750ddca918032d6349adbf9a4b6555e7db20da:

  Linux 4.15-rc8 (2018-01-14 15:32:30 -0800)

are available in the git repository at:

  git://git.monstr.eu/linux-2.6-microblaze.git tags/microblaze-4.16-rc1

for you to fetch changes up to 7b6ce52be3f86520524711a6f33f3866f9339694:

  microblaze: Setup proper dependency for optimized lib functions
(2018-01-22 11:24:14 +0100)


Microblaze patches for 4.16-rc1

- Fix endian handling and Kconfig dependency
- Fix iounmap prototype


Arnd Bergmann (2):
  microblaze: fix endian handling
  microblaze: fix iounmap prototype

Michal Simek (1):
  microblaze: Setup proper dependency for optimized lib functions

 arch/microblaze/Kconfig.platform |  1 +
 arch/microblaze/Makefile | 17 +++--
 arch/microblaze/include/asm/io.h |  2 +-
 arch/microblaze/mm/pgtable.c |  2 +-
 4 files changed, 14 insertions(+), 8 deletions(-)



-- 
Michal Simek, Ing. (M.Eng), OpenPGP -> KeyID: FE3D1F91
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel - Xilinx Microblaze
Maintainer of Linux kernel - Xilinx Zynq ARM and ZynqMP ARM64 SoCs
U-Boot custodian - Xilinx Microblaze/Zynq/ZynqMP SoCs




signature.asc
Description: OpenPGP digital signature


Re: [PATCH] socket: Provide bounce buffer for constant sized put_cmsg()

2018-02-01 Thread kbuild test robot
Hi Kees,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.15 next-20180201]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Kees-Cook/socket-Provide-bounce-buffer-for-constant-sized-put_cmsg/20180202-113637
reproduce:
# apt-get install sparse
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> net/bluetooth/hci_sock.c:1406:17: sparse: incorrect type in initializer 
>> (invalid types) @@ expected void _val @@ got void _val @@
   net/bluetooth/hci_sock.c:1406:17: expected void _val
   net/bluetooth/hci_sock.c:1406:17: got void 
>> net/bluetooth/hci_sock.c:1406:17: sparse: expression using sizeof(void)
   In file included from include/linux/compat.h:16:0,
from include/linux/ethtool.h:17,
from include/linux/netdevice.h:41,
from include/net/sock.h:51,
from include/net/bluetooth/bluetooth.h:29,
from net/bluetooth/hci_sock.c:32:
   net/bluetooth/hci_sock.c: In function 'hci_sock_cmsg':
   include/linux/socket.h:355:19: error: variable or field '_val' declared void
_val = 14- ^
   net/bluetooth/hci_sock.c:1406:3: note: in expansion of macro 'put_cmsg'
put_cmsg(msg, SOL_HCI, HCI_CMSG_TSTAMP, len, data);
^~~~
   include/linux/socket.h:355:26: warning: dereferencing 'void pointer
_val = 20- ^~~
   net/bluetooth/hci_sock.c:1406:3: note: in expansion of macro 'put_cmsg'
put_cmsg(msg, SOL_HCI, HCI_CMSG_TSTAMP, len, data);
^~~~
   include/linux/socket.h:355:26: error: void value not ignored as it ought to 
be
_val = 26- ^
   net/bluetooth/hci_sock.c:1406:3: note: in expansion of macro 'put_cmsg'
put_cmsg(msg, SOL_HCI, HCI_CMSG_TSTAMP, len, data);
^~~~

vim +1406 net/bluetooth/hci_sock.c

767c5eb5 Marcel Holtmann 2007-09-09  1405  
767c5eb5 Marcel Holtmann 2007-09-09 @1406   put_cmsg(msg, SOL_HCI, 
HCI_CMSG_TSTAMP, len, data);
a61bbcf2 Patrick McHardy 2005-08-14  1407   }
^1da177e Linus Torvalds  2005-04-16  1408  }
^1da177e Linus Torvalds  2005-04-16  1409  

:: The code at line 1406 was first introduced by commit
:: 767c5eb5d35aeb85987143f0a730bc21d3ecfb3d [Bluetooth] Add compat handling 
for timestamp structure

:: TO: Marcel Holtmann 
:: CC: Marcel Holtmann 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Re: [PATCH v6 17/41] dt-bindings: clock: Add bindings for DA8XX CFGCHIP clocks

2018-02-01 Thread Sekhar Nori
On Saturday 20 January 2018 10:43 PM, David Lechner wrote:
> +EMIFA clock source (ASYNC1)
> +---
> +Required properties:
> +- compatible: shall be "ti,da850-async1-clksrc".
> +- #clock-cells: from common clock binding; shall be set to 0.
> +- clocks: phandles to the parent clocks corresponding to clock-names
> +- clock-names: shall be "pll0_sysclk3", "div4.5"

Is this clock really referred to as aysnc1 in documentation? I don't get
hits for async1 in OMAP-L138 TRM.

Thanks,
Sekhar


Re: possible deadlock in get_user_pages_unlocked

2018-02-01 Thread Al Viro
On Fri, Feb 02, 2018 at 05:46:26AM +, Al Viro wrote:
> On Thu, Feb 01, 2018 at 09:35:02PM -0800, Eric Biggers wrote:
> 
> > Try starting up multiple instances of the program; that sometimes helps with
> > these races that are hard to hit (since you may e.g. have a different 
> > number of
> > CPUs than syzbot used).  If I start up 4 instances I see the lockdep splat 
> > after
> > around 2-5 seconds.
> 
> 5 instances in parallel, 10 minutes into the run...
> 
> >  This is on latest Linus tree (4bf772b1467).  Also note the
> > reproducer uses KVM, so if you're running it in a VM it will only work if 
> > you've
> > enabled nested virtualization on the host (kvm_intel.nested=1).
> 
> cat /sys/module/kvm_amd/parameters/nested 
> 1
> 
> on host
> 
> > Also it appears to go away if I revert ce53053ce378c21 ("kvm: switch
> > get_user_page_nowait() to get_user_pages_unlocked()").
> 
> That simply prevents this reproducer hitting get_user_pages_unlocked()
> instead of grab mmap_sem/get_user_pages/drop mmap_sem.  I.e. does not
> allow __get_user_pages_locked() to drop/regain ->mmap_sem.
> 
> The bug may be in the way we call get_user_pages_unlocked() in that
> commit, but it might easily be a bug in __get_user_pages_locked()
> exposed by that reproducer somehow.

I think I understand what's going on.  FOLL_NOWAIT handling is a serious
mess ;-/  I'll probably have something to test tomorrow - I still can't
reproduce it here, unfortunately.


Re: [PATCH v2 1/7] ARM: imx: add timer stop flag to ARM power off state

2018-02-01 Thread Shawn Guo
On Wed, Jan 10, 2018 at 10:04:47PM +0100, Stefan Agner wrote:
> When the CPU is in ARM power off state the ARM architected
> timers are stopped. The flag is already present in the higher
> power WAIT mode.
> 
> This allows to use the ARM generic timer on i.MX 6UL/6ULL SoC.
> Without the flag the kernel freezes when the timer enters the
> first time ARM power off mode.
> 
> Note: The default timer on i.MX6SX is the i.MX GPT timer which is
> not disabled during CPU idle. However, the timer is not affected
> by the CPUIDLE_FLAG_TIMER_STOP flag. The flag only affects CPU
> local timers.
> 
> Cc: Anson Huang 
> Signed-off-by: Stefan Agner 
> Reviewed-by: Lucas Stach 

Applied all, thanks.


[V2][PATCH] ohci-hcd: Fix race condition caused by ohci_urb_enqueue() and io_watchdog_func()

2018-02-01 Thread Haiqing Bai
From: Shigeru Yoshida 

Running io_watchdog_func() while ohci_urb_enqueue() is running can
cause a race condition where ohci->prev_frame_no is corrupted and the
watchdog can mis-detect following error:

  ohci-platform 664a0800.usb: frame counter not updating; disabled
  ohci-platform 664a0800.usb: HC died; cleaning up

Specifically, following scenario causes a race condition:

  1. ohci_urb_enqueue() calls spin_lock_irqsave(&ohci->lock, flags)
 and enters the critical section
  2. ohci_urb_enqueue() calls timer_pending(&ohci->io_watchdog) and it
 returns false
  3. ohci_urb_enqueue() sets ohci->prev_frame_no to a frame number
 read by ohci_frame_no(ohci)
  4. ohci_urb_enqueue() schedules io_watchdog_func() with mod_timer()
  5. ohci_urb_enqueue() calls spin_unlock_irqrestore(&ohci->lock,
 flags) and exits the critical section
  6. Later, ohci_urb_enqueue() is called
  7. ohci_urb_enqueue() calls spin_lock_irqsave(&ohci->lock, flags)
 and enters the critical section
  8. The timer scheduled on step 4 expires and io_watchdog_func() runs
  9. io_watchdog_func() calls spin_lock_irqsave(&ohci->lock, flags)
 and waits on it because ohci_urb_enqueue() is already in the
 critical section on step 7
 10. ohci_urb_enqueue() calls timer_pending(&ohci->io_watchdog) and it
 returns false
 11. ohci_urb_enqueue() sets ohci->prev_frame_no to new frame number
 read by ohci_frame_no(ohci) because the frame number proceeded
 between step 3 and 6
 12. ohci_urb_enqueue() schedules io_watchdog_func() with mod_timer()
 13. ohci_urb_enqueue() calls spin_unlock_irqrestore(&ohci->lock,
 flags) and exits the critical section, then wake up
 io_watchdog_func() which is waiting on step 9
 14. io_watchdog_func() enters the critical section
 15. io_watchdog_func() calls ohci_frame_no(ohci) and set frame_no
 variable to the frame number
 16. io_watchdog_func() compares frame_no and ohci->prev_frame_no

On step 16, because this calling of io_watchdog_func() is scheduled on
step 4, the frame number set in ohci->prev_frame_no is expected to the
number set on step 3.  However, ohci->prev_frame_no is overwritten on
step 11.  Because step 16 is executed soon after step 11, the frame
number might not proceed, so ohci->prev_frame_no must equals to
frame_no.

To address above scenario, this patch introduces a special sentinel
value IO_WATCHDOG_OFF and set this value to ohci->prev_frame_no when
the watchdog is not pending or running.  When ohci_urb_enqueue()
schedules the watchdog (step 4 and 12 above), it compares
ohci->prev_frame_no to IO_WATCHDOG_OFF so that ohci->prev_frame_no is
not overwritten while io_watchdog_func() is running.

v2: Instead of adding an extra flag variable, defining IO_WATCHDOG_OFF
as a special sentinel value for prev_frame_no.

Signed-off-by: Shigeru Yoshida 
Signed-off-by: Haiqing Bai 
---
 drivers/usb/host/ohci-hcd.c | 10 +++---
 drivers/usb/host/ohci-hub.c |  4 +++-
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/host/ohci-hcd.c b/drivers/usb/host/ohci-hcd.c
index ee96763..84f88fa 100644
--- a/drivers/usb/host/ohci-hcd.c
+++ b/drivers/usb/host/ohci-hcd.c
@@ -74,6 +74,7 @@
 
 #defineSTATECHANGE_DELAY   msecs_to_jiffies(300)
 #defineIO_WATCHDOG_DELAY   msecs_to_jiffies(275)
+#defineIO_WATCHDOG_OFF 0xff00
 
 #include "ohci.h"
 #include "pci-quirks.h"
@@ -231,7 +232,7 @@ static int ohci_urb_enqueue (
}
 
/* Start up the I/O watchdog timer, if it's not running */
-   if (!timer_pending(&ohci->io_watchdog) &&
+   if (ohci->prev_frame_no == IO_WATCHDOG_OFF &&
list_empty(&ohci->eds_in_use) &&
!(ohci->flags & OHCI_QUIRK_QEMU)) {
ohci->prev_frame_no = ohci_frame_no(ohci);
@@ -501,6 +502,7 @@ static int ohci_init (struct ohci_hcd *ohci)
return 0;
 
timer_setup(&ohci->io_watchdog, io_watchdog_func, 0);
+   ohci->prev_frame_no = IO_WATCHDOG_OFF;
 
ohci->hcca = dma_alloc_coherent (hcd->self.controller,
sizeof(*ohci->hcca), &ohci->hcca_dma, GFP_KERNEL);
@@ -730,7 +732,7 @@ static void io_watchdog_func(struct timer_list *t)
u32 head;
struct ed   *ed;
struct td   *td, *td_start, *td_next;
-   unsignedframe_no;
+   unsignedframe_no, prev_frame_no = IO_WATCHDOG_OFF;
unsigned long   flags;
 
spin_lock_irqsave(&ohci->lock, flags);
@@ -835,7 +837,7 @@ static void io_watchdog_func(struct timer_list *t)
}
}
if (!list_empty(&ohci->eds_in_use)) {
-   ohci->prev_frame_no = frame_no;
+   prev_frame_no = frame_no;
ohci->prev_wdh_cnt = ohci->wdh_cnt;
ohci->prev_donehead = ohci_readl(ohci,
  

Re: Change in register_blkdev() behavior

2018-02-01 Thread Logan Gunthorpe


On 01/02/18 07:17 PM, Srivatsa S. Bhat wrote:
> Thank you for confirming! I'll send a patch to fix that (and the analogous
> case for CHRDEV_MAJOR_DYN_EXT_END).

Great! Thanks!

>>>
>>>  for (cd = chrdevs[major_to_index(i)]; cd; cd = cd->next)
>>>  if (cd->major == i)
>>>  break;
>>>
>>>  if (cd == NULL || cd->major != i)
>>>   
>>> It seems that this latter condition is unnecessary, as it will never be
>>> true. (We'll reach here only if cd == NULL or cd->major == i).
>>
>> Not quite. chrdevs[] may contain majors that also hit on the hash but don't 
>> equal 'i'. So the for loop will iterate through all hashes matching 'i' and 
>> if there is one or more and they all don't match 'i', it will fall through 
>> the loop and cd will be set to something non-null and not equal to i.
>>
> 
> Hmm, the code doesn't appear to be doing that though? The loop's fall
> through occurs one past the last entry, when cd == NULL. The only
> other way it can exit the loop is if it hits the break statement
> (which implies that cd->major == i). So what am I missing?

Oh, yup, you're right. Looking at it a second (or third) time, it should
be NULL or equal to 'i'.

Thanks,

Logan


Re: [PATCH v1] mm: optimize memory hotplug

2018-02-01 Thread kbuild test robot
Hi Pavel,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on mmotm/master]
[also build test ERROR on next-20180201]
[cannot apply to v4.15]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Pavel-Tatashin/mm-optimize-memory-hotplug/20180202-125437
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: x86_64-randconfig-x017-201804 (attached as .config)
compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   mm/page_alloc.c: In function 'init_reserved_page':
>> mm/page_alloc.c:1215:44: error: 'zone' undeclared (first use in this 
>> function)
 __init_single_page(pfn_to_page(pfn), pfn, zone, nid);
   ^~~~
   mm/page_alloc.c:1215:44: note: each undeclared identifier is reported only 
once for each function it appears in

vim +/zone +1215 mm/page_alloc.c

  1196  
  1197  #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
  1198  static void __meminit init_reserved_page(unsigned long pfn)
  1199  {
  1200  pg_data_t *pgdat;
  1201  int nid, zid;
  1202  
  1203  if (!early_page_uninitialised(pfn))
  1204  return;
  1205  
  1206  nid = early_pfn_to_nid(pfn);
  1207  pgdat = NODE_DATA(nid);
  1208  
  1209  for (zid = 0; zid < MAX_NR_ZONES; zid++) {
  1210  struct zone *zone = &pgdat->node_zones[zid];
  1211  
  1212  if (pfn >= zone->zone_start_pfn && pfn < 
zone_end_pfn(zone))
  1213  break;
  1214  }
> 1215  __init_single_page(pfn_to_page(pfn), pfn, zone, nid);
  1216  }
  1217  #else
  1218  static inline void init_reserved_page(unsigned long pfn)
  1219  {
  1220  }
  1221  #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
  1222  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [RFCv2 05/17] media: Document the media request API

2018-02-01 Thread Alexandre Courbot
Hi Randy,

On Fri, Feb 2, 2018 at 3:14 AM, Randy Dunlap  wrote:
> On 01/31/2018 02:24 AM, Alexandre Courbot wrote:
>> From: Laurent Pinchart 
>>
>> The media request API is made of a new ioctl to implement request
>> management. Document it.
>>
>> Signed-off-by: Laurent Pinchart 
>> [acour...@chromium.org: adapt for newest API]
>> Signed-off-by: Alexandre Courbot 
>> ---
>>  Documentation/media/uapi/mediactl/media-funcs.rst  |   1 +
>>  .../media/uapi/mediactl/media-ioc-request-cmd.rst  | 141 
>> +
>>  2 files changed, 142 insertions(+)
>>  create mode 100644 
>> Documentation/media/uapi/mediactl/media-ioc-request-cmd.rst
>>
>> diff --git a/Documentation/media/uapi/mediactl/media-funcs.rst 
>> b/Documentation/media/uapi/mediactl/media-funcs.rst
>> index 076856501cdb..e3a45d82ffcb 100644
>> --- a/Documentation/media/uapi/mediactl/media-funcs.rst
>> +++ b/Documentation/media/uapi/mediactl/media-funcs.rst
>> @@ -15,4 +15,5 @@ Function Reference
>>  media-ioc-g-topology
>>  media-ioc-enum-entities
>>  media-ioc-enum-links
>> +media-ioc-request-cmd
>>  media-ioc-setup-link
>> diff --git a/Documentation/media/uapi/mediactl/media-ioc-request-cmd.rst 
>> b/Documentation/media/uapi/mediactl/media-ioc-request-cmd.rst
>> new file mode 100644
>> index ..723b422afcce
>> --- /dev/null
>> +++ b/Documentation/media/uapi/mediactl/media-ioc-request-cmd.rst
>> @@ -0,0 +1,141 @@
>> +.. -*- coding: utf-8; mode: rst -*-
>> +
>> +.. _media_ioc_request_cmd:
>> +
>> +***
>> +ioctl MEDIA_IOC_REQUEST_CMD
>> +***
>> +
>> +Name
>> +
>> +
>> +MEDIA_IOC_REQUEST_CMD - Manage media device requests
>> +
>> +
>> +Synopsis
>> +
>> +
>> +.. c:function:: int ioctl( int fd, MEDIA_IOC_REQUEST_CMD, struct 
>> media_request_cmd *argp )
>> +:name: MEDIA_IOC_REQUEST_CMD
>> +
>> +
>> +Arguments
>> +=
>> +
>> +``fd``
>> +File descriptor returned by :ref:`open() `.
>> +
>> +``argp``
>> +
>> +
>> +Description
>> +===
>> +
>> +The MEDIA_IOC_REQUEST_CMD ioctl allow applications to manage media device
>
>allows
>
>> +requests. A request is an object that can group media device configuration
>> +parameters, including subsystem-specific parameters, in order to apply all 
>> the
>> +parameters atomically. Applications are responsible for allocating and
>> +deleting requests, filling them with configuration parameters submitting 
>> them.
>
> and 
> submitting them.
>
>> +
>> +Request operations are performed by calling the MEDIA_IOC_REQUEST_CMD ioctl
>> +with a pointer to a struct :c:type:`media_request_cmd` with the cmd field 
>> set
>> +to the appropriate command. :ref:`media-request-command` lists the commands
>> +supported by the ioctl.
>> +
>> +The struct :c:type:`media_request_cmd` request field contains the file
>> +descriptorof the request on which the command operates. For the
>
>descriptor of
>
>> +``MEDIA_REQ_CMD_ALLOC`` command the field is set to zero by applications and
>> +filled by the driver. For all other commands the field is set by 
>> applications
>> +and left untouched by the driver.
>> +
>> +To allocate a new request applications use the ``MEDIA_REQ_CMD_ALLOC``
>> +command. The driver will allocate a new request and return its FD in the
>> +request field. After allocation, the request is "empty", which means that it
>> +does not hold any state of its own, and that the hardware's state will not 
>> be
>> +affected by it unless it is passed as argument to V4L2 or media controller
>> +commands.
>> +
>> +Requests are reference-counted. A newly allocated request is referenced
>> +by the returned file descriptor, and can be later referenced by
>> +subsystem-specific operations. Requests will thus be automatically deleted
>> +when they're not longer used after the returned file descriptor is closed.
>
>  no longer
>
>> +
>> +If a request isn't needed applications can delete it by calling ``close()``
>> +on it. The driver will drop the file handle reference. The request will not
>> +be usable through the MEDIA_IOC_REQUEST_CMD ioctl anymore, but will only be
>> +deleted when the last reference is released. If no other reference exists 
>> when
>> +``close()`` is invoked the request will be deleted immediately.
>> +
>> +After creating a request applications should fill it with configuration
>> +parameters. This is performed through subsystem-specific request APIs 
>> outside
>> +the scope of the media controller API. See the appropriate subsystem APIs 
>> for
>> +more information, including how they interact with the MEDIA_IOC_REQUEST_CMD
>> +ioctl.
>> +
>> +Once a request contains all the desired configuration parameters it can be
>> +submitted using the ``MEDIA_REQ_CMD_SUBMIT`` command. This will let the
>> +buffers queued for the request be passed to their respective drivers, which
>> +will t

Re: [PATCH 6/6] Pmalloc: self-test

2018-02-01 Thread kbuild test robot
Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.15]
[cannot apply to next-20180201]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180202-123437
config: xtensa-allyesconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=xtensa 

All error/warnings (new ones prefixed by >>):

   mm/pmalloc-selftest.c: In function 'pmalloc_selftest':
>> mm/pmalloc-selftest.c:43:14: error: implicit declaration of function 
>> 'vmalloc'; did you mean 'kvmalloc'? [-Werror=implicit-function-declaration]
 var_vmall = vmalloc(SIZE_2);
 ^~~
 kvmalloc
>> mm/pmalloc-selftest.c:43:12: warning: assignment makes pointer from integer 
>> without a cast [-Wint-conversion]
 var_vmall = vmalloc(SIZE_2);
   ^
>> mm/pmalloc-selftest.c:52:2: error: implicit declaration of function 'vfree'; 
>> did you mean 'kvfree'? [-Werror=implicit-function-declaration]
 vfree(var_vmall);
 ^
 kvfree
   cc1: some warnings being treated as errors

vim +43 mm/pmalloc-selftest.c

19  
20  #define validate_alloc(expected, variable, size)\
21  pr_notice("must be " expected ": %s",   \
22is_pmalloc_object(variable, size) > 0 ? "ok" : "no")
23  
24  #define is_alloc_ok(variable, size) \
25  validate_alloc("ok", variable, size)
26  
27  #define is_alloc_no(variable, size) \
28  validate_alloc("no", variable, size)
29  
30  void pmalloc_selftest(void)
31  {
32  struct gen_pool *pool_unprot;
33  struct gen_pool *pool_prot;
34  void *var_prot, *var_unprot, *var_vmall;
35  
36  pr_notice("pmalloc self-test");
37  pool_unprot = pmalloc_create_pool("unprotected", 0);
38  pool_prot = pmalloc_create_pool("protected", 0);
39  BUG_ON(!(pool_unprot && pool_prot));
40  
41  var_unprot = pmalloc(pool_unprot,  SIZE_1 - 1, GFP_KERNEL);
42  var_prot = pmalloc(pool_prot,  SIZE_1, GFP_KERNEL);
  > 43  var_vmall = vmalloc(SIZE_2);
44  is_alloc_ok(var_unprot, 10);
45  is_alloc_ok(var_unprot, SIZE_1);
46  is_alloc_ok(var_unprot, PAGE_SIZE);
47  is_alloc_no(var_unprot, SIZE_1 + 1);
48  is_alloc_no(var_vmall, 10);
49  
50  
51  pfree(pool_unprot, var_unprot);
  > 52  vfree(var_vmall);

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH] of: cache phandle nodes to decrease cost of of_find_node_by_phandle()

2018-02-01 Thread Chintan Pandya



On 2/2/2018 12:40 AM, Frank Rowand wrote:

On 02/01/18 02:31, Chintan Pandya wrote:



Anyways, will fix this locally and share test results.


Thanks, I look forward to the results.



Set up for this time was slightly different. So, taken all the numbers again.

Boot to shell time (in ms): Experiment 2
[1] Base    : 14.843805 14.784842 14.842338
[2] 64 sized fixed cache    : 14.189292 14.23 14.266711
[3] Dynamic freeable cache    : 14.112412 14.064772 14.036052

So, [3] (this patch) looks to be improving 750ms (on avg from base build).



Is this with the many debug options enabled?  If so, can you repeat with
a normal configuration?


Could you share me the point of doing this experiment in perf mode ? I 
don't have a set up for taking these numbers in perf mode. For that, I 
need to ask some other team and round trip follow ups. In my set up, I 
rely on serial console logging which gets disabled in perf mode.




Thanks,

Frank



Chintan
--
Qualcom India Private Limited, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, a Linux Foundation
Collaborative Project


Re: [PATCH 4/6] Protectable Memory

2018-02-01 Thread kbuild test robot
Hi Igor,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.15]
[cannot apply to next-20180201]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180202-123437
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   mm/pmalloc.o: In function `pmalloc_pool_show_chunks':
>> pmalloc.c:(.text+0x50): undefined reference to `gen_pool_for_each_chunk'
   mm/pmalloc.o: In function `pmalloc_pool_show_size':
>> pmalloc.c:(.text+0x6e): undefined reference to `gen_pool_size'
   mm/pmalloc.o: In function `pmalloc_pool_show_avail':
>> pmalloc.c:(.text+0x8a): undefined reference to `gen_pool_avail'
   mm/pmalloc.o: In function `pmalloc_chunk_free':
>> pmalloc.c:(.text+0x171): undefined reference to `gen_pool_flush_chunk'
   mm/pmalloc.o: In function `pmalloc_create_pool':
>> pmalloc.c:(.text+0x19b): undefined reference to `gen_pool_create'
>> pmalloc.c:(.text+0x2bb): undefined reference to `gen_pool_destroy'
   mm/pmalloc.o: In function `pmalloc_prealloc':
>> pmalloc.c:(.text+0x350): undefined reference to `gen_pool_add_virt'
   mm/pmalloc.o: In function `pmalloc':
>> pmalloc.c:(.text+0x3a7): undefined reference to `gen_pool_alloc'
   pmalloc.c:(.text+0x3f1): undefined reference to `gen_pool_add_virt'
   pmalloc.c:(.text+0x401): undefined reference to `gen_pool_alloc'
   mm/pmalloc.o: In function `pmalloc_destroy_pool':
   pmalloc.c:(.text+0x4a1): undefined reference to `gen_pool_for_each_chunk'
   pmalloc.c:(.text+0x4a8): undefined reference to `gen_pool_destroy'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH] of: cache phandle nodes to decrease cost of of_find_node_by_phandle()

2018-02-01 Thread Chintan Pandya



On 2/2/2018 2:39 AM, Frank Rowand wrote:

On 02/01/18 06:24, Rob Herring wrote:

And so
far, no one has explained why a bigger cache got slower.


Yes, I still find that surprising.


I thought a bit about this. And realized that increasing the cache size 
should help improve the performance only if there are too many misses 
with the smaller cache. So, from my experiments some time back, I looked 
up the logs and saw the access pattern. Seems like, there is 
*not_too_much* juggling during look up by phandles.


See the access pattern here: 
https://drive.google.com/file/d/1qfAD8OsswNJABgAwjJf6Gr_JZMeK7rLV/view?usp=sharing


Sample log is pasted below where number in the last is phandle value.
Line 8853: [   37.425405] OF: want to search this 262
Line 8854: [   37.425453] OF: want to search this 262
Line 8855: [   37.425499] OF: want to search this 262
Line 8856: [   37.425549] OF: want to search this 15
Line 8857: [   37.425599] OF: want to search this 5
Line 8858: [   37.429989] OF: want to search this 253
Line 8859: [   37.430058] OF: want to search this 253
Line 8860: [   37.430217] OF: want to search this 253
Line 8861: [   37.430278] OF: want to search this 253
Line 8862: [   37.430337] OF: want to search this 253
Line 8863: [   37.430399] OF: want to search this 254
Line 8864: [   37.430597] OF: want to search this 254
Line 8865: [   37.430656] OF: want to search this 254


Above explains why results with cache size 64 and 128 have almost 
similar results. Now, for cache size 256 we have degrading performance. 
I don't have a good theory here but I'm assuming that by making large SW 
cache, we miss the benefits of real HW cache which is typically smaller 
than our array size. Also, in my set up, I've set max_cpu=1 to reduce 
the variance. That again, should affect the cache holding pattern in HW 
and affect the perf numbers.



Chintan
--
Qualcom India Private Limited, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum, a Linux Foundation
Collaborative Project


Re: possible deadlock in get_user_pages_unlocked

2018-02-01 Thread Al Viro
On Thu, Feb 01, 2018 at 09:35:02PM -0800, Eric Biggers wrote:

> Try starting up multiple instances of the program; that sometimes helps with
> these races that are hard to hit (since you may e.g. have a different number 
> of
> CPUs than syzbot used).  If I start up 4 instances I see the lockdep splat 
> after
> around 2-5 seconds.

5 instances in parallel, 10 minutes into the run...

>  This is on latest Linus tree (4bf772b1467).  Also note the
> reproducer uses KVM, so if you're running it in a VM it will only work if 
> you've
> enabled nested virtualization on the host (kvm_intel.nested=1).

cat /sys/module/kvm_amd/parameters/nested 
1

on host

> Also it appears to go away if I revert ce53053ce378c21 ("kvm: switch
> get_user_page_nowait() to get_user_pages_unlocked()").

That simply prevents this reproducer hitting get_user_pages_unlocked()
instead of grab mmap_sem/get_user_pages/drop mmap_sem.  I.e. does not
allow __get_user_pages_locked() to drop/regain ->mmap_sem.

The bug may be in the way we call get_user_pages_unlocked() in that
commit, but it might easily be a bug in __get_user_pages_locked()
exposed by that reproducer somehow.


Re: [PATCH 4/6] Protectable Memory

2018-02-01 Thread kbuild test robot
Hi Igor,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.15]
[cannot apply to next-20180201]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180202-123437
config: i386-randconfig-x071-201804 (attached as .config)
compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All warnings (new ones prefixed by >>):

   mm/pmalloc.c: In function 'pmalloc_pool_show_avail':
>> mm/pmalloc.c:71:25: warning: format '%lu' expects argument of type 'long 
>> unsigned int', but argument 3 has type 'size_t {aka unsigned int}' 
>> [-Wformat=]
 return sprintf(buf, "%lu\n", gen_pool_avail(data->pool));
  ~~^ ~~
  %u
   mm/pmalloc.c: In function 'pmalloc_pool_show_size':
   mm/pmalloc.c:81:25: warning: format '%lu' expects argument of type 'long 
unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
 return sprintf(buf, "%lu\n", gen_pool_size(data->pool));
  ~~^ ~
  %u

vim +71 mm/pmalloc.c

63  
64  static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
65 struct kobj_attribute *attr,
66 char *buf)
67  {
68  struct pmalloc_data *data;
69  
70  data = container_of(attr, struct pmalloc_data, attr_avail);
  > 71  return sprintf(buf, "%lu\n", gen_pool_avail(data->pool));
72  }
73  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v6 4/6] iommu/arm-smmu: Add the device_link between masters and smmu

2018-02-01 Thread Sricharan R
Hi Robin/Vivek,

On 2/1/2018 2:23 PM, Vivek Gautam wrote:
> Hi,
> 
> 
> On 1/31/2018 6:39 PM, Robin Murphy wrote:
>> On 19/01/18 11:43, Vivek Gautam wrote:
>>> From: Sricharan R 
>>>
>>> Finally add the device link between the master device and
>>> smmu, so that the smmu gets runtime enabled/disabled only when the
>>> master needs it. This is done from add_device callback which gets
>>> called once when the master is added to the smmu.
>>
>> Don't we need to balance this with a device_link_del() in .remove_device 
>> (like exynos-iommu does)?
> 
> Right. Will add device_link_del() call. Thanks for pointing out.

 The reason for not adding device_link_del from .remove_device was, the core 
device_del 
 which calls the .remove_device from notifier, calls device_links_purge before 
that.
 That does the same thing as device_link_del. So by the time .remove_device is 
called,
 device_links for that device is already cleaned up. Vivek, you may want to 
check once that
 calling device_link_del from .remove_device has no effect, just to confirm 
once more.

Regards,
 Sricharan
 
> 
> regards
> Vivek
> 
>>
>> Robin.
>>
>>> Signed-off-by: Sricharan R 
>>> ---
>>>   drivers/iommu/arm-smmu.c | 11 +++
>>>   1 file changed, 11 insertions(+)
>>>
>>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>> index 95478bfb182c..33bbcfedb896 100644
>>> --- a/drivers/iommu/arm-smmu.c
>>> +++ b/drivers/iommu/arm-smmu.c
>>> @@ -1367,6 +1367,7 @@ static int arm_smmu_add_device(struct device *dev)
>>>   struct arm_smmu_device *smmu;
>>>   struct arm_smmu_master_cfg *cfg;
>>>   struct iommu_fwspec *fwspec = dev->iommu_fwspec;
>>> +    struct device_link *link;
>>>   int i, ret;
>>>     if (using_legacy_binding) {
>>> @@ -1428,6 +1429,16 @@ static int arm_smmu_add_device(struct device *dev)
>>>     pm_runtime_put_sync(smmu->dev);
>>>   +    /*
>>> + * Establish the link between smmu and master, so that the
>>> + * smmu gets runtime enabled/disabled as per the master's
>>> + * needs.
>>> + */
>>> +    link = device_link_add(dev, smmu->dev, DL_FLAG_PM_RUNTIME);
>>> +    if (!link)
>>> +    dev_warn(smmu->dev, "Unable to create device link between %s and 
>>> %s\n",
>>> + dev_name(smmu->dev), dev_name(dev));
>>> +
>>>   return 0;
>>>     out_cfg_free:
>>>
> 

-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation


Re: possible deadlock in get_user_pages_unlocked

2018-02-01 Thread Eric Biggers
On Fri, Feb 02, 2018 at 04:50:20AM +, Al Viro wrote:
> On Thu, Feb 01, 2018 at 04:58:00PM -0800, syzbot wrote:
> > Hello,
> > 
> > syzbot hit the following crash on upstream commit
> > 7109a04eae81c41ed529da9f3c48c3655ccea741 (Thu Feb 1 17:37:30 2018 +)
> > Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide
> > 
> > So far this crash happened 2 times on upstream.
> > C reproducer is attached.
> 
> Umm...  How reproducible that is?
> 
> > syzkaller reproducer is attached.
> > Raw console output is attached.
> > compiler: gcc (GCC) 7.1.1 20170620
> > .config is attached.
> 
> Can't reproduce with gcc 5.4.1 (same .config, same C reproducer).
> 
> It looks like __get_user_pages_locked() returning with *locked zeroed,
> but ->mmap_sem not dropped.  I don't see what could've lead to it and
> attempts to reproduce had not succeeded so far...
> 
> How long does it normally take for lockdep splat to trigger?
> 

Try starting up multiple instances of the program; that sometimes helps with
these races that are hard to hit (since you may e.g. have a different number of
CPUs than syzbot used).  If I start up 4 instances I see the lockdep splat after
around 2-5 seconds.  This is on latest Linus tree (4bf772b1467).  Also note the
reproducer uses KVM, so if you're running it in a VM it will only work if you've
enabled nested virtualization on the host (kvm_intel.nested=1).

Also it appears to go away if I revert ce53053ce378c21 ("kvm: switch
get_user_page_nowait() to get_user_pages_unlocked()").

- Eric


Re: [PATCH 1/3] ARM: dts: imx6ul-evk: Add support for mag3110 sensor

2018-02-01 Thread Shawn Guo
On Tue, Jan 09, 2018 at 09:46:23AM -0200, Marco Franchi wrote:
> The i.MX 6UL EVK has a MAG3110 Magnetometer sensor in its base board.
> Add support for this sensor, which is included in the trivial i2c devices 
> and according to the bindings documentation, just need a compatible field 
> and an address.
> 
> Signed-off-by: Marco Franchi 

Applied all, thanks.


Re: [PATCH] fbdev: simplefb: add support for 'memory-region' property on DT node

2018-02-01 Thread Kunihiko Hayashi
Hi Andy,

On Thu, 1 Feb 2018 21:03:30 +0200  wrote:

> On Thu, Feb 1, 2018 at 5:56 PM, Bartlomiej Zolnierkiewicz
>  wrote:
> > On Tuesday, January 23, 2018 08:34:56 PM Kunihiko Hayashi wrote:
> >> Enables 'memory-region' property referring to the memory description on
> >> the reserved-memory node in case of devicetree use.
> >> If there is no 'reg' property that specifies the address and size of
> >> the framebuffer, the address and size written in the memory description
> >> on the reserved-memory node can be used for the framebuffer.
> >>
> >> Furthermore, the reserved-memory node needs to have "no-map" attributes
> >> because simplefb driver maps the region by ioremap_wc().
> >>
> >> Signed-off-by: Kunihiko Hayashi 
> 
> >> +- memory-region: phandle to a node describing memory region as framebuffer
> >> +  memory instead of reg property. The node should include
> >> +  'no-map'.
> 
> >>   mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> >> + if (!mem)
> >> + mem = simplefb_parse_dt_reserved_mem(&pdev->dev);
> 
> I'm not sure I understood why you need this entire function?
> 
> Put your memory resource ('reg' property) as part of reserved memory
> with necessary flags.

Surely we prepare a memory resource as a part of reserved memory,
for example:

reserved-memory {
fb_area: memory@0xa000 {
reg = <0xa000 0x40>;
no-map;
};
};

And we need to specify the address and size as a reg property in
the framebuffer node.

framebuffer {
compatible = "simple-framebuffer";
reg = <0xa000 0x40>;
};

This function allows us to specify the area with phandle to
the reserved memory instead of same address and size.

framebuffer {
compatible = "simple-framebuffer";
memory-region = <&fb_area>;
};

If both reg and memory-region properties are specified
in the framebuffer node, the reg propery will be applied.

---
Best Regards,
Kunihiko Hayashi



Re: [RFC PATCH v1 13/13] mm: splice local lists onto the front of the LRU

2018-02-01 Thread Aaron Lu
On Wed, Jan 31, 2018 at 06:04:13PM -0500, daniel.m.jor...@oracle.com wrote:
> Now that release_pages is scaling better with concurrent removals from
> the LRU, the performance results (included below) showed increased
> contention on lru_lock in the add-to-LRU path.
> 
> To alleviate some of this contention, do more work outside the LRU lock.
> Prepare a local list of pages to be spliced onto the front of the LRU,
> including setting PageLRU in each page, before taking lru_lock.  Since
> other threads use this page flag in certain checks outside lru_lock,
> ensure each page's LRU links have been properly initialized before
> setting the flag, and use memory barriers accordingly.
> 
> Performance Results
> 
> This is a will-it-scale run of page_fault1 using 4 different kernels.
> 
> kernel kern #
> 
>   4.15-rc2  1
>   large-zone-batch  2
>  lru-lock-base  3
>lru-lock-splice  4
> 
> Each kernel builds on the last.  The first is a baseline, the second
> makes zone->lock more scalable by increasing an order-0 per-cpu
> pagelist's 'batch' and 'high' values to 310 and 1860 respectively

Since the purpose of the patchset is to optimize lru_lock, you may
consider adjusting pcp->high to be >= 32768(page_fault1's test size is
128M = 32768 pages). That should eliminate zone->lock contention
entirely.

> (courtesy of Aaron Lu's patch), the third scales lru_lock without
> splicing pages (the previous patch in this series), and the fourth adds
> page splicing (this patch).
> 
> N tasks mmap, fault, and munmap anonymous pages in a loop until the test
> time has elapsed.


Re: [PATCH] fbdev: simplefb: add support for 'memory-region' property on DT node

2018-02-01 Thread Kunihiko Hayashi
Hi Bartlomiej, Rob, Mark,

On Thu, 1 Feb 2018 16:56:08 +0100  wrote:
> 
> Hi,
> 
> On Tuesday, January 23, 2018 08:34:56 PM Kunihiko Hayashi wrote:
> > Enables 'memory-region' property referring to the memory description on
> > the reserved-memory node in case of devicetree use.
> > If there is no 'reg' property that specifies the address and size of
> > the framebuffer, the address and size written in the memory description
> > on the reserved-memory node can be used for the framebuffer.
> > 
> > Furthermore, the reserved-memory node needs to have "no-map" attributes
> > because simplefb driver maps the region by ioremap_wc().
> > 
> > Signed-off-by: Kunihiko Hayashi 
> 
> This needs an ACK from Rob or Mark (DT bindings Maintainers).

Thanks for pointing out.

Rob, Mark, would you please confirm the patch?
This patch contains the addition to dt-bindings.

---
Best Regards,
Kunihiko Hayashi



Re: [RFC PATCH v1 13/13] mm: splice local lists onto the front of the LRU

2018-02-01 Thread Daniel Jordan

On 02/01/2018 06:30 PM, Tim Chen wrote:

On 01/31/2018 03:04 PM, daniel.m.jor...@oracle.com wrote:

Now that release_pages is scaling better with concurrent removals from
the LRU, the performance results (included below) showed increased
contention on lru_lock in the add-to-LRU path.

To alleviate some of this contention, do more work outside the LRU lock.
Prepare a local list of pages to be spliced onto the front of the LRU,
including setting PageLRU in each page, before taking lru_lock.  Since
other threads use this page flag in certain checks outside lru_lock,
ensure each page's LRU links have been properly initialized before
setting the flag, and use memory barriers accordingly.

Performance Results

This is a will-it-scale run of page_fault1 using 4 different kernels.

 kernel kern #

   4.15-rc2  1
   large-zone-batch  2
  lru-lock-base  3
lru-lock-splice  4

Each kernel builds on the last.  The first is a baseline, the second
makes zone->lock more scalable by increasing an order-0 per-cpu
pagelist's 'batch' and 'high' values to 310 and 1860 respectively
(courtesy of Aaron Lu's patch), the third scales lru_lock without
splicing pages (the previous patch in this series), and the fourth adds
page splicing (this patch).

N tasks mmap, fault, and munmap anonymous pages in a loop until the test
time has elapsed.

The process case generally does better than the thread case most likely
because of mmap_sem acting as a bottleneck.  There's ongoing work
upstream[*] to scale this lock, however, and once it goes in, my
hypothesis is the thread numbers here will improve.


Neglected to mention my hardware:
  2-socket system, 44 cores, 503G memory, Intel(R) Xeon(R) CPU E5-2699 v4 @ 
2.20GHz



kern #  ntask proc  thrprocstdev thrstdev
speedup  speedup   pgf/spgf/s
  1  1   705,5331,644 705,2271,122
  2  1 2.5% 2.8% 722,912  453 724,807  728
  3  1 2.6% 2.6% 724,215  653 723,213  941
  4  1 2.3% 2.8% 721,746  272 724,944  728

kern #  ntask proc  thrprocstdev thrstdev
speedup  speedup   pgf/spgf/s
  1  4 2,525,4877,428   1,973,616   12,568
  2  4 2.6% 7.6%   2,590,6996,968   2,123,570   10,350
  3  4 2.3% 4.4%   2,584,668   12,833   2,059,822   10,748
  4  4 4.7% 5.2%   2,643,251   13,297   2,076,8089,506

kern #  ntask proc  thrprocstdev thrstdev
speedup  speedup   pgf/spgf/s
  1 16 6,444,656   20,528   3,226,356   32,874
  2 16 1.9%10.4%   6,566,846   20,803   3,560,437   64,019
  3 1618.3% 6.8%   7,624,749   58,497   3,447,109   67,734
  4 1628.2% 2.5%   8,264,125   31,677   3,306,679   69,443

kern #  ntask proc  thrprocstdev thrstdev
speedup  speedup   pgf/spgf/s
  1 3211,564,988   32,211   2,456,507   38,898
  2 32 1.8% 1.5%  11,777,119   45,418   2,494,064   27,964
  3 3216.1%-2.7%  13,426,746   94,057   2,389,934   40,186
  4 3226.2% 1.2%  14,593,745   28,121   2,486,059   42,004

kern #  ntask proc  thrprocstdev thrstdev
speedup  speedup   pgf/spgf/s
  1 6412,080,629   33,676   2,443,043   61,973
  2 64 3.9% 9.9%  12,551,136  206,202   2,684,632   69,483
  3 6415.0%-3.8%  13,892,933  351,657   2,351,232   67,875
  4 6421.9% 1.8%  14,728,765   64,945   2,485,940   66,839

[*] https://lwn.net/Articles/724502/  Range reader/writer locks
 https://lwn.net/Articles/744188/  Speculative page faults



The speedup looks pretty nice and seems to peak at 16 tasks.  Do you have an 
explanation of what
causes the drop from 28.2% to 21.9% going from 16 to 64 tasks?


The system I was testing on had 44 cores, so part of the decrease in % speedup 
is just saturating the hardware (e.g. memory bandwidth).  At 64 processes, we 
start having to share cores.  Page faults per second did continue to increase 
each time we added more processes, though, so there's no anti-scaling going on.


Was
the loss in performance due to increased contention on LRU lock when more tasks 
running
results in a higher likelihood of hitting the sentinel?


That seems to be another factor, yes.  I used lock_stat to measure it, and it 
showed that wait time on lru_lock nearly tripled when going from 32 to 64 
processes, but I also take lock_stat with a grain of salt as it changes the 
timing/interaction 

Re: possible deadlock in get_user_pages_unlocked

2018-02-01 Thread Al Viro
On Thu, Feb 01, 2018 at 04:58:00PM -0800, syzbot wrote:
> Hello,
> 
> syzbot hit the following crash on upstream commit
> 7109a04eae81c41ed529da9f3c48c3655ccea741 (Thu Feb 1 17:37:30 2018 +)
> Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide
> 
> So far this crash happened 2 times on upstream.
> C reproducer is attached.

Umm...  How reproducible that is?

> syzkaller reproducer is attached.
> Raw console output is attached.
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached.

Can't reproduce with gcc 5.4.1 (same .config, same C reproducer).

It looks like __get_user_pages_locked() returning with *locked zeroed,
but ->mmap_sem not dropped.  I don't see what could've lead to it and
attempts to reproduce had not succeeded so far...

How long does it normally take for lockdep splat to trigger?


Re: [PATCH v2 2/2] mm, memory_hotplug: optimize memory hotplug

2018-02-01 Thread kbuild test robot
Hi Pavel,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on mmotm/master]
[also build test WARNING on next-20180201]
[cannot apply to v4.15]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Pavel-Tatashin/mm-uninitialized-struct-page-poisoning-sanity-checking/20180202-105827
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: x86_64-randconfig-x019-201804 (attached as .config)
compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from include/linux/page_ref.h:7:0,
from include/linux/mm.h:26,
from mm/sparse.c:5:
   mm/sparse.c: In function 'sparse_add_one_section':
   include/linux/page-flags.h:159:29: warning: overflow in implicit constant 
conversion [-Woverflow]
#define PAGE_POISON_PATTERN ~0ul
^
>> mm/sparse.c:838:17: note: in expansion of macro 'PAGE_POISON_PATTERN'
 memset(memmap, PAGE_POISON_PATTERN,
^~~
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:fls64
   Cyclomatic Complexity 1 include/linux/log2.h:__ilog2_u64
   Cyclomatic Complexity 3 include/linux/log2.h:is_power_of_2
   Cyclomatic Complexity 1 include/asm-generic/getorder.h:__get_order
   Cyclomatic Complexity 3 include/linux/string.h:memset
   Cyclomatic Complexity 1 include/linux/spinlock.h:spinlock_check
   Cyclomatic Complexity 1 include/linux/spinlock.h:spin_unlock_irqrestore
   Cyclomatic Complexity 1 include/linux/nodemask.h:node_state
   Cyclomatic Complexity 1 include/linux/memory_hotplug.h:pgdat_resize_lock
   Cyclomatic Complexity 1 include/linux/memory_hotplug.h:pgdat_resize_unlock
   Cyclomatic Complexity 1 include/linux/mmzone.h:pfn_to_section_nr
   Cyclomatic Complexity 1 include/linux/mmzone.h:section_nr_to_pfn
   Cyclomatic Complexity 3 include/linux/mmzone.h:__nr_to_section
   Cyclomatic Complexity 1 include/linux/mmzone.h:__section_mem_map_addr
   Cyclomatic Complexity 3 include/linux/mmzone.h:present_section
   Cyclomatic Complexity 1 include/linux/mmzone.h:present_section_nr
   Cyclomatic Complexity 3 include/linux/mmzone.h:valid_section
   Cyclomatic Complexity 1 include/linux/mmzone.h:valid_section_nr
   Cyclomatic Complexity 1 include/linux/mmzone.h:__pfn_to_section
   Cyclomatic Complexity 2 include/linux/mmzone.h:pfn_present
   Cyclomatic Complexity 1 arch/x86/include/asm/topology.h:numa_node_id
   Cyclomatic Complexity 1 include/linux/topology.h:numa_mem_id
   Cyclomatic Complexity 1 include/linux/mm.h:is_vmalloc_addr
   Cyclomatic Complexity 1 include/linux/mm.h:page_to_section
   Cyclomatic Complexity 28 include/linux/slab.h:kmalloc_index
   Cyclomatic Complexity 1 include/linux/slab.h:__kmalloc_node
   Cyclomatic Complexity 1 include/linux/slab.h:kmem_cache_alloc_node_trace
   Cyclomatic Complexity 68 include/linux/slab.h:kmalloc_large
   Cyclomatic Complexity 5 include/linux/slab.h:kmalloc
   Cyclomatic Complexity 5 include/linux/slab.h:kmalloc_node
   Cyclomatic Complexity 1 include/linux/slab.h:kzalloc_node
   Cyclomatic Complexity 1 include/linux/bootmem.h:alloc_remap
   Cyclomatic Complexity 1 mm/sparse.c:set_section_nid
   Cyclomatic Complexity 1 mm/sparse.c:sparse_encode_early_nid
   Cyclomatic Complexity 1 mm/sparse.c:sparse_early_nid
   Cyclomatic Complexity 4 mm/sparse.c:next_present_section_nr
   Cyclomatic Complexity 1 mm/sparse.c:check_usemap_section_nr
   Cyclomatic Complexity 6 mm/sparse.c:alloc_usemap_and_memmap
   Cyclomatic Complexity 1 include/linux/bootmem.h:memblock_virt_alloc
   Cyclomatic Complexity 1 include/linux/bootmem.h:memblock_virt_alloc_node
   Cyclomatic Complexity 2 mm/sparse.c:sparse_index_alloc
   Cyclomatic Complexity 3 mm/sparse.c:sparse_index_init
   Cyclomatic Complexity 1 
include/linux/bootmem.h:memblock_virt_alloc_node_nopanic
   Cyclomatic Complexity 1 mm/sparse.c:sparse_early_usemaps_alloc_pgdat_section
   Cyclomatic Complexity 2 mm/sparse.c:sparse_encode_mem_map
   Cyclomatic Complexity 2 mm/sparse.c:sparse_init_one_section
   Cyclomatic Complexity 1 include/linux/bootmem.h:memblock_free_early
   Cyclomatic Complexity 1 include/linux/gfp.h:__alloc_pages
   Cyclomatic Complexity 2 include/linux/gfp.h:__alloc_pages_node
   Cyclomatic Complexity 2 include/linux/gfp.h:alloc_pages_node
   Cyclomatic Complexity 70 mm/sparse.c:__kmalloc_section_memmap
   Cyclomatic Complexity 1 mm/sparse.c:kmalloc_section_memmap
   Cyclomatic Complexity 2 mm/sparse.c:__kfree_section_memmap
   Cyclomatic Complexity 3 mm/sparse.c:get_section_nid
   Cyclomatic Complexity 5 mm/sparse.c:__section_nr
   Cyclomatic Complexity 2 mm/sparse.c:section_mark_present
   Cyclomatic Complexity 7 mm/sparse.c:mminit_validate_memmodel_limi

Re: [GIT PULL tools] Linux kernel memory model

2018-02-01 Thread Boqun Feng
On Wed, Jan 31, 2018 at 05:17:28PM -0800, Paul E. McKenney wrote:
[...]
> > - A long term question: have you considered and would it make sense to 
> > generate a
> >   memory-barriers.txt like file directly into Documentation/locking/, using 
> > the
> >   formal description? That way any changes/extensions/fixes to the model 
> > could be
> >   tracked on a high level, without readers having to understand the formal
> >   representation.
> 
> I hadn't considered this at all, actually.  ;-)
> 
> The sections of memory-barriers.txt dealing with MMIO ordering would need
> to stay hand-generated, but they are a very small fraction of the total.
> The herd7 tool is capable of generating cool diagrams sort of like
> this one: https://static.lwn.net/images/2017/mm-model/rmo-acyclic.png,
> which might replace at least some of the hand-generated ASCII-art
> diagrams.
> 

Which reminds me, one thing we could start with is to try to convert all
the examples with litmus tests. Has this been done somewhere (e.g. in
your litmus github repo)? If not, I can try if you think that's a good
idea.

Regards,
Boqun

> Although I do confess harboring some skepticism about being able to
> generated high-quality text, there is no denying that it would be
> valuable to be able to do so.
> 
> > In any case, the base commit is certainly nice and clean and I've pulled it 
> > into 
> > tip:locking/core for a v4.17 merge.
> 
> Very good!
> 
> > I believe these additional improvements (to the extent you agree with doing 
> > them!) 
> > could/should be done as add-on commits on top of this existing commit.
> 
> Sounds good!
> 
> Would you prefer a pull request or a patch series for these?
> 
>   Thanx, Paul
> 


signature.asc
Description: PGP signature


Re: [PATCH] arm64: acpi,efi: fix alignment fault in accessing ACPI tables at kdump

2018-02-01 Thread AKASHI Takahiro
On Thu, Feb 01, 2018 at 05:34:26PM +, Ard Biesheuvel wrote:
> On 1 February 2018 at 09:04, AKASHI Takahiro  
> wrote:
> > This is a fix against the issue that crash dump kernel may hang up
> > during booting, which can happen on any ACPI-based system with "ACPI
> > Reclaim Memory."
> >
> > (kernel messages after panic kicked off kdump)
> >(snip...)
> > Bye!
> >(snip...)
> > ACPI: Core revision 20170728
> > pud=2e7d0003, *pmd=2e7c0003, *pte=00e839710707
> > Internal error: Oops: 9621 [#1] SMP
> > Modules linked in:
> > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
> > task: 08d05180 task.stack: 08cc
> > PC is at acpi_ns_lookup+0x25c/0x3c0
> > LR is at acpi_ds_load1_begin_op+0xa4/0x294
> >(snip...)
> > Process swapper/0 (pid: 0, stack limit = 0x08cc)
> > Call trace:
> >(snip...)
> > [] acpi_ns_lookup+0x25c/0x3c0
> > [] acpi_ds_load1_begin_op+0xa4/0x294
> > [] acpi_ps_build_named_op+0xc4/0x198
> > [] acpi_ps_create_op+0x14c/0x270
> > [] acpi_ps_parse_loop+0x188/0x5c8
> > [] acpi_ps_parse_aml+0xb0/0x2b8
> > [] acpi_ns_one_complete_parse+0x144/0x184
> > [] acpi_ns_parse_table+0x48/0x68
> > [] acpi_ns_load_table+0x4c/0xdc
> > [] acpi_tb_load_namespace+0xe4/0x264
> > [] acpi_load_tables+0x48/0xc0
> > [] acpi_early_init+0x9c/0xd0
> > [] start_kernel+0x3b4/0x43c
> > Code: b9008fb9 2a000318 36380054 32190318 (b94002c0)
> > ---[ end trace c46ed37f9651c58e ]---
> > Kernel panic - not syncing: Fatal exception
> > Rebooting in 10 seconds..
> >
> > (diagnosis)
> > * This fault is a data abort, alignment fault (ESR=0x9621)
> >   during reading out ACPI table.
> > * Initial ACPI tables are normally stored in system ram and marked as
> >   "ACPI Reclaim memory" by the firmware.
> > * After the commit f56ab9a5b73c ("efi/arm: Don't mark ACPI reclaim
> >   memory as MEMBLOCK_NOMAP"), those regions are differently handled
> >   as they are "memblock-reserved", without NOMAP bit.
> > * So they are now excluded from device tree's "usable-memory-range"
> >   which kexec-tools determines based on a current view of /proc/iomem.
> 
> So this patch does fix the fallout of how this issue affects ACPI boot
> in particular, but is it correct in the first place for the kexec
> tools to disregard memory that has been memblock_reserved()?

My previous patch[1] is just for this. It would work not only for
the case here but also for any "reserved" memory regions.
The only noticeable drawback that I see in this approach is that
the meaning of "usable-memory-range" is now a bit obscure.

Alternatively we may want to modify /proc/iomem, adding a specific
resource entry for "ACPI Reclaim regions," and in turn kexec-tool so as
to put them into "usable-memory-range", but it is rather a stopgap
measure in my opinion. I don't like kexec relying heavily on userspace
tool as exporting the reserved regions in /proc/iomem is also useless
for most users.

(In this sense, I would go for kexec_file :)

> For
> instance, the UEFI memory map is memblock_reserve()d as well, along
> with other parts of memory that have been populated by the firmware.
> 
> > * When crash dump kernel boots up, it tries to accesses ACPI tables by
> >   mapping them with ioremap(), not ioremap_cache(), in acpi_os_ioremap()
> >   since they are no longer part of mapped system ram.
> > * Given that ACPI accessor/helper functions are compiled in without
> >   unaligned access support (ACPI_MISALIGNMENT_NOT_SUPPORTED),
> >   any unaligned access to ACPI tables can cause a fatal panic.
> >
> > With this patch, acpi_os_ioremap() always honors a memory attribute
> > provided by the firmware (efi). Hence retaining cacheability in said cases
> > allows the kernel safe access to ACPI tables.
> >
> > Please note that arm_enable_runtime_services(), which is renamed to
> > efi_enter_virtual_mode() due to the similarity to x86's, is now called
> > earlier before acpi_early_init() since efi_mem_attributes() relies on
> > efi.memmap being mapped.
> >
> > Signed-off-by: AKASHI Takahiro 
> > Suggested-by: James Morse 
> > Suggested-by: Ard Biesheuvel 
> > Reported-by and Tested-by: Bhupesh Sharma 
> > ---
> >  arch/arm64/include/asm/acpi.h  | 23 ---
> >  arch/arm64/kernel/acpi.c   | 11 +++
> 
> Split into two patches --here-- please

Sure, but only if this patch is a favorable approach over [1].

[1] 
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-January/553098.html

Thanks,
-Takahiro AKASHI

> 
> >  drivers/firmware/efi/arm-runtime.c | 15 ++-
> >  init/main.c|  3 +++
> >  4 files changed, 28 insertions(+), 24 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/acpi.h

RE: [Intel-wired-lan] [RFC PATCH] e1000e: Remove Other from EIAC.

2018-02-01 Thread Brown, Aaron F
> From: Brown, Aaron F
> Sent: Thursday, February 1, 2018 8:30 PM
> To: 'Benjamin Poirier' ; Alexander Duyck
> 
> Cc: Netdev ; intel-wired-lan  l...@lists.osuosl.org>; linux-kernel@vger.kernel.org
> Subject: RE: [Intel-wired-lan] [RFC PATCH] e1000e: Remove Other from EIAC.
> 
> > From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On
> > Behalf Of Benjamin Poirier
> > Sent: Tuesday, January 30, 2018 11:31 PM
> > To: Alexander Duyck 
> > Cc: Netdev ; intel-wired-lan  > l...@lists.osuosl.org>; linux-kernel@vger.kernel.org
> > Subject: Re: [Intel-wired-lan] [RFC PATCH] e1000e: Remove Other from
> EIAC.
> >
> > On 2018/01/30 11:46, Alexander Duyck wrote:
> > > On Wed, Jan 17, 2018 at 10:50 PM, Benjamin Poirier
> 
> > wrote:
> > > > It was reported that emulated e1000e devices in vmware esxi 6.5 Build
> > > > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid
> > receiver
> > > > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
> > > > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in
> e1000_msix_other()
> > > > on emulated e1000e devices. In comparison, on real e1000e 82574
> > hardware,
> > > > icr=0x8004 (_INT_ASSERTED | _OTHER) in the same situation.
> > > >
> > > > Some experimentation showed that this flaw in vmware e1000e
> > emulation can
> > > > be worked around by not setting Other in EIAC. This is how it was
> before
> > > > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1).
> > > >
> > > > Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt
> bursts")
> > > > Signed-off-by: Benjamin Poirier 
> > > > ---
> > >
> > > Hi Benjamin,
> > >
> > > How would you feel about resubmitting this patch for net?
> > >
> > > We have some issues that have come up and it would be useful to have
> > > this fixed in the kernel sooner rather than later. I would be okay
> > > with us applying it for now while we work on coming up with a more
> > > complete solution.
> >
> > Ok, I've resent it in its original form. Once it's in mainline I'll
> > rebase the cleanups.
> 
> Tested-by: Aaron Brown
> 
Stupid line wrap ... been that kind of day.

Tested-by: Aaron Brown 

> 
> > ___
> > Intel-wired-lan mailing list
> > intel-wired-...@osuosl.org
> > https://lists.osuosl.org/mailman/listinfo/intel-wired-lan


Re: [PATCH] socket: Provide bounce buffer for constant sized put_cmsg()

2018-02-01 Thread kbuild test robot
Hi Kees,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.15 next-20180201]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Kees-Cook/socket-Provide-bounce-buffer-for-constant-sized-put_cmsg/20180202-113637
config: i386-randconfig-s0-201804 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All error/warnings (new ones prefixed by >>):

   In file included from include/linux/skbuff.h:23:0,
from include/linux/if_ether.h:23,
from include/uapi/linux/ethtool.h:19,
from include/linux/ethtool.h:18,
from include/linux/netdevice.h:41,
from include/net/sock.h:51,
from include/net/bluetooth/bluetooth.h:29,
from net/bluetooth/hci_sock.c:32:
   net/bluetooth/hci_sock.c: In function 'hci_sock_cmsg':
>> include/linux/socket.h:355:19: error: variable or field '_val' declared void
  typeof(*(_ptr)) _val = *(_ptr);\
  ^
>> net/bluetooth/hci_sock.c:1406:3: note: in expansion of macro 'put_cmsg'
  put_cmsg(msg, SOL_HCI, HCI_CMSG_TSTAMP, len, data);
  ^~~~
>> include/linux/socket.h:355:26: warning: dereferencing 'void *' pointer
  typeof(*(_ptr)) _val = *(_ptr);\
 ^~~
>> net/bluetooth/hci_sock.c:1406:3: note: in expansion of macro 'put_cmsg'
  put_cmsg(msg, SOL_HCI, HCI_CMSG_TSTAMP, len, data);
  ^~~~
>> include/linux/socket.h:355:26: error: void value not ignored as it ought to 
>> be
  typeof(*(_ptr)) _val = *(_ptr);\
 ^
>> net/bluetooth/hci_sock.c:1406:3: note: in expansion of macro 'put_cmsg'
  put_cmsg(msg, SOL_HCI, HCI_CMSG_TSTAMP, len, data);
  ^~~~
--
   In file included from include/linux/kernel.h:10:0,
from include/linux/list.h:9,
from include/linux/random.h:10,
from include/linux/net.h:22,
from net/rxrpc/recvmsg.c:14:
   In function 'rxrpc_recvmsg_new_call',
   inlined from 'rxrpc_recvmsg' at net/rxrpc/recvmsg.c:539:7:
>> include/linux/compiler.h:330:38: error: call to '__compiletime_assert_119' 
>> declared with attribute error: BUILD_BUG_ON failed: sizeof(_val) != (0)
 _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
 ^
   include/linux/compiler.h:310:4: note: in definition of macro 
'__compiletime_assert'
   prefix ## suffix();\
   ^~
   include/linux/compiler.h:330:2: note: in expansion of macro 
'_compiletime_assert'
 _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
 ^~~
   include/linux/build_bug.h:47:37: note: in expansion of macro 
'compiletime_assert'
#define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
^~
   include/linux/build_bug.h:71:2: note: in expansion of macro 
'BUILD_BUG_ON_MSG'
 BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
 ^~~~
>> include/linux/socket.h:356:3: note: in expansion of macro 'BUILD_BUG_ON'
  BUILD_BUG_ON(sizeof(_val) != (_len));   \
  ^~~~
>> net/rxrpc/recvmsg.c:119:8: note: in expansion of macro 'put_cmsg'
 ret = put_cmsg(msg, SOL_RXRPC, RXRPC_NEW_CALL, 0, &tmp);
   ^~~~
   In function 'rxrpc_recvmsg_term',
   inlined from 'rxrpc_recvmsg' at net/rxrpc/recvmsg.c:562:7:
   include/linux/compiler.h:330:38: error: call to '__compiletime_assert_77' 
declared with attribute error: BUILD_BUG_ON failed: sizeof(_val) != (0)
 _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
 ^
   include/linux/compiler.h:310:4: note: in definition of macro 
'__compiletime_assert'
   prefix ## suffix();\
   ^~
   include/linux/compiler.h:330:2: note: in expansion of macro 
'_compiletime_assert'
 _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
 ^~~
   include/linux/build_bug.h:47:37: note: in expansion of macro 
'compiletime_assert'
#define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
^~
   include/linux/build_bug.h:71:2: note: in expansion of macro 
'BUILD_BUG_ON_MSG'
 BUILD_BUG_ON_MSG(condition, 

RE: [Intel-wired-lan] [RFC PATCH] e1000e: Remove Other from EIAC.

2018-02-01 Thread Brown, Aaron F
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On
> Behalf Of Benjamin Poirier
> Sent: Tuesday, January 30, 2018 11:31 PM
> To: Alexander Duyck 
> Cc: Netdev ; intel-wired-lan  l...@lists.osuosl.org>; linux-kernel@vger.kernel.org
> Subject: Re: [Intel-wired-lan] [RFC PATCH] e1000e: Remove Other from EIAC.
> 
> On 2018/01/30 11:46, Alexander Duyck wrote:
> > On Wed, Jan 17, 2018 at 10:50 PM, Benjamin Poirier 
> wrote:
> > > It was reported that emulated e1000e devices in vmware esxi 6.5 Build
> > > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid
> receiver
> > > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after
> > > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other()
> > > on emulated e1000e devices. In comparison, on real e1000e 82574
> hardware,
> > > icr=0x8004 (_INT_ASSERTED | _OTHER) in the same situation.
> > >
> > > Some experimentation showed that this flaw in vmware e1000e
> emulation can
> > > be worked around by not setting Other in EIAC. This is how it was before
> > > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1).
> > >
> > > Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts")
> > > Signed-off-by: Benjamin Poirier 
> > > ---
> >
> > Hi Benjamin,
> >
> > How would you feel about resubmitting this patch for net?
> >
> > We have some issues that have come up and it would be useful to have
> > this fixed in the kernel sooner rather than later. I would be okay
> > with us applying it for now while we work on coming up with a more
> > complete solution.
> 
> Ok, I've resent it in its original form. Once it's in mainline I'll
> rebase the cleanups.

Tested-by: Aaron Brown


> ___
> Intel-wired-lan mailing list
> intel-wired-...@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan


Re: [RFC PATCH v1 03/13] mm: add lock array to pgdat and batch fields to struct page

2018-02-01 Thread Daniel Jordan



On 02/01/2018 05:50 PM, Tim Chen wrote:

On 01/31/2018 03:04 PM, daniel.m.jor...@oracle.com wrote:

This patch simply adds the array of locks and struct page fields.
Ignore for now where the struct page fields are: we need to find a place
to put them that doesn't enlarge the struct.

Signed-off-by: Daniel Jordan 
---
  include/linux/mm_types.h | 5 +
  include/linux/mmzone.h   | 7 +++
  mm/page_alloc.c  | 3 +++
  3 files changed, 15 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index cfd0ac4e5e0e..6e9d26f0cecf 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -190,6 +190,11 @@ struct page {
struct kmem_cache *slab_cache;  /* SL[AU]B: Pointer to slab */
};
  
+	struct {

+   unsigned lru_batch;
+   bool lru_sentinel;


The above declaration adds at least 5 bytes to struct page.
It adds a lot of extra memory overhead when multiplied
by the number of pages in the system.


Yes, I completely agree, enlarging struct page won't cut it for the final 
solution.


We can move sentinel bool to page flag, at least for 64 bit system.


There did seem to be room for one more bit the way my kernel was configured 
(without losing a component in page->flags), but I'd have to look again.


And 8 bit is probably enough for lru_batch id to give a max
lru_batch number of 256 to break the locks into 256 smaller ones.
The max used in the patchset is 32 and that is already giving
pretty good spread of the locking.
It will be better if we can find some unused space in struct page
to squeeze it in.


One idea we'd had was to store the batch id in the lower bits of the mem_cgroup 
pointer.  CONFIG_MEMCG seems to be pretty ubiquitous these days, and it's a 
large enough struct (1048 bytes on one machine) to have room in the lower bits.

Another way might be to encode the previous and next lru page pointers as pfn's 
instead of struct list_head *'s, shrinking the footprint of struct page's lru 
field to allow room for the batch id.


Re: [PATCH v11 3/3] mm, x86: display pkey in smaps only if arch supports pkeys

2018-02-01 Thread kbuild test robot
Hi Ram,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.15 next-20180201]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Ram-Pai/mm-x86-powerpc-Enhancements-to-Memory-Protection-Keys/20180202-120004
config: x86_64-randconfig-x005-201804 (attached as .config)
compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All error/warnings (new ones prefixed by >>):

   In file included from arch/x86/include/asm/mmu_context.h:8:0,
from arch/x86/events/core.c:36:
>> include/linux/pkeys.h:16:23: error: expected identifier or '(' before 
>> numeric constant
#define vma_pkey(vma) 0
  ^
>> arch/x86/include/asm/mmu_context.h:298:19: note: in expansion of macro 
>> 'vma_pkey'
static inline int vma_pkey(struct vm_area_struct *vma)
  ^~~~

vim +16 include/linux/pkeys.h

 7  
 8  #ifdef CONFIG_ARCH_HAS_PKEYS
 9  #include 
10  #else /* ! CONFIG_ARCH_HAS_PKEYS */
11  #define arch_max_pkey() (1)
12  #define execute_only_pkey(mm) (0)
13  #define arch_override_mprotect_pkey(vma, prot, pkey) (0)
14  #define PKEY_DEDICATED_EXECUTE_ONLY 0
15  #define ARCH_VM_PKEY_FLAGS 0
  > 16  #define vma_pkey(vma) 0
17  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [RFC PATCH v1 00/13] lru_lock scalability

2018-02-01 Thread Daniel Jordan



On 02/01/2018 10:54 AM, Steven Whitehouse wrote:

Hi,


On 31/01/18 23:04, daniel.m.jor...@oracle.com wrote:

lru_lock, a per-node* spinlock that protects an LRU list, is one of the
hottest locks in the kernel.  On some workloads on large machines, it
shows up at the top of lock_stat.

One way to improve lru_lock scalability is to introduce an array of locks,
with each lock protecting certain batches of LRU pages.

 *ooo**ooo**ooo** ...
 |   ||   ||   ||
  \ batch 1 /  \ batch 2 /  \ batch 3 /

In this ASCII depiction of an LRU, a page is represented with either '*'
or 'o'.  An asterisk indicates a sentinel page, which is a page at the
edge of a batch.  An 'o' indicates a non-sentinel page.

To remove a non-sentinel LRU page, only one lock from the array is
required.  This allows multiple threads to remove pages from different
batches simultaneously.  A sentinel page requires lru_lock in addition to
a lock from the array.

Full performance numbers appear in the last patch in this series, but this
prototype allows a microbenchmark to do up to 28% more page faults per
second with 16 or more concurrent processes.

This work was developed in collaboration with Steve Sistare.

Note: This is an early prototype.  I'm submitting it now to support my
request to attend LSF/MM, as well as get early feedback on the idea.  Any
comments appreciated.


* lru_lock is actually per-memcg, but without memcg's in the picture it
   becomes per-node.

GFS2 has an lru list for glocks, which can be contended under certain 
workloads. Work is still ongoing to figure out exactly why, but this looks like 
it might be a good approach to that issue too. The main purpose of GFS2's lru 
list is to allow shrinking of the glocks under memory pressure via the 
gfs2_scan_glock_lru() function, and it looks like this type of approach could 
be used there to improve the scalability,


Glad to hear that this could help in gfs2 as well.

Hopefully struct gfs2_glock is less space constrained than struct page for 
storing the few bits of metadata that this approach requires.

Daniel



Steve.



Aaron Lu (1):
   mm: add a percpu_pagelist_batch sysctl interface

Daniel Jordan (12):
   mm: allow compaction to be disabled
   mm: add lock array to pgdat and batch fields to struct page
   mm: introduce struct lru_list_head in lruvec to hold per-LRU batch
 info
   mm: add batching logic to add/delete/move API's
   mm: add lru_[un]lock_all APIs
   mm: convert to-be-refactored lru_lock callsites to lock-all API
   mm: temporarily convert lru_lock callsites to lock-all API
   mm: introduce add-only version of pagevec_lru_move_fn
   mm: add LRU batch lock API's
   mm: use lru_batch locking in release_pages
   mm: split up release_pages into non-sentinel and sentinel passes
   mm: splice local lists onto the front of the LRU

  include/linux/mm_inline.h | 209 +-
  include/linux/mm_types.h  |   5 ++
  include/linux/mmzone.h    |  25 +-
  kernel/sysctl.c   |   9 ++
  mm/Kconfig    |   1 -
  mm/huge_memory.c  |   6 +-
  mm/memcontrol.c   |   5 +-
  mm/mlock.c    |  11 +--
  mm/mmzone.c   |   7 +-
  mm/page_alloc.c   |  43 +-
  mm/page_idle.c    |   4 +-
  mm/swap.c | 208 -
  mm/vmscan.c   |  49 +--
  13 files changed, 500 insertions(+), 82 deletions(-)





linux-next: Tree for Feb 2

2018-02-01 Thread Stephen Rothwell
Hi all,

Please do not add any v4.17 material to your linux-next included branches
until after v4.16-rc1 has been released.

Changes since 20180201:

The pci tree lost its build failure.

The integrity tree gained a conflict against Linus' tree.

The kvm tree gained a conflict against the tip tree.

The akpm tree gained a conflict against the fscrypt tree.

Non-merge commits (relative to Linus' tree): 4785
 5050 files changed, 192884 insertions(+), 154986 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 256 trees (counting Linus' and 44 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (8e44e6600caa Merge branch 'KASAN-read_word_at_a_time')
Merging fixes/master (820bf5c419e4 Merge tag 'scsi-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi)
Merging kbuild-current/fixes (36c1681678b5 genksyms: drop *.hash.c from 
.gitignore)
Merging arc-current/for-curr (a46f24acf8bc ARC: boot log: Fix trailing 
semicolon)
Merging arm-current/fixes (091f02483df7 ARM: net: bpf: clarify tail_call index)
Merging m68k-current/for-linus (2334b1ac1235 MAINTAINERS: Add NuBus subsystem 
entry)
Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups)
Merging powerpc-fixes/fixes (1b689a95ce74 powerpc/pseries: include 
linux/types.h in asm/hvcall.h)
Merging sparc/master (aebb48f5e465 sparc64: fix typo in 
CONFIG_CRYPTO_DES_SPARC64 => CONFIG_CRYPTO_CAMELLIA_SPARC64)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (743efac1 net: pxa168_eth: add netconsole support)
Merging bpf/master (65073a67331d bpf: fix null pointer deref in 
bpf_prog_test_run_xdp)
Merging ipsec/master (545d8ae7afff xfrm: fix boolean assignment in 
xfrm_get_type_offload)
Merging netfilter/master (3f34cfae1238 netfilter: on sockopt() acquire sock 
lock only in the required scope)
Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook 
mask only if set)
Merging wireless-drivers/master (a9e6d44ddecc ssb: Do not disable PCI host on 
non-Mips)
Merging mac80211/master (c4de37ee2b55 mac80211: mesh: fix wrong mesh TTL offset 
calculation)
Merging rdma-fixes/for-rc (ae59c3f0b6cf RDMA/mlx5: Fix out-of-bound access 
while querying AH)
Merging sound-current/for-linus (1c9609e3a8cf ALSA: hda - Reduce the suspend 
time consumption for ALC256)
Merging pci-current/for-linus (838cda369707 x86/PCI: Enable AMD 64-bit window 
on resume)
Merging driver-core.current/driver-core-linus (30a7acd57389 Linux 4.15-rc6)
Merging tty.current/tty-linus (30a7acd57389 Linux 4.15-rc6)
Merging usb.current/usb-linus (a8750ddca918 Linux 4.15-rc8)
Merging usb-gadget-fixes/fixes (b2cd1df66037 Linux 4.15-rc7)
Merging usb-serial-fixes/usb-linus (d14ac576d10f USB: serial: cp210x: add new 
device ID ELV ALC 8xxx)
Merging usb-chipidea-fixes/ci-for-usb-stable (964728f9f407 USB: chipidea: msm: 
fix ulpi-node lookup)
Merging phy/fixes (2b88212c4cc6 phy: rcar-gen3-usb2: select USB_COMMON)
Merging staging.current/staging-linus (a8750ddca918 Linux 4.15-rc8)
Merging char-misc.current/char-misc-linus (a8750ddca918 Linux 4.15-rc8)
Merging input-current/for-linus (d67ad78e09cb Merge branch 'next' into 
for-linus)
Merging crypto-current/master (2d55807b7

Re: [PATCH v2 1/2] mm: uninitialized struct page poisoning sanity checking

2018-02-01 Thread kbuild test robot
Hi Pavel,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on mmotm/master]
[also build test WARNING on v4.15 next-20180201]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Pavel-Tatashin/mm-uninitialized-struct-page-poisoning-sanity-checking/20180202-105827
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: x86_64-randconfig-x013-201804 (attached as .config)
compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from include/linux/page_ref.h:7:0,
from include/linux/mm.h:26,
from include/linux/memblock.h:18,
from mm/memblock.c:21:
   mm/memblock.c: In function 'memblock_virt_alloc_try_nid_raw':
>> include/linux/page-flags.h:159:29: warning: overflow in implicit constant 
>> conversion [-Woverflow]
#define PAGE_POISON_PATTERN ~0ul
^
>> mm/memblock.c:1376:15: note: in expansion of macro 'PAGE_POISON_PATTERN'
  memset(ptr, PAGE_POISON_PATTERN, size);
  ^~~

vim +159 include/linux/page-flags.h

   158  
 > 159  #define PAGE_POISON_PATTERN ~0ul
   160  static inline int PagePoisoned(const struct page *page)
   161  {
   162  return page->flags == PAGE_POISON_PATTERN;
   163  }
   164  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v3 18/18] arm64: Kill PSCI_GET_VERSION as a variant-2 workaround

2018-02-01 Thread Hanjun Guo
Hi Marc,

Thank you for keeping me in the loop, just minor comments below.

On 2018/2/1 19:46, Marc Zyngier wrote:
> Now that we've standardised on SMCCC v1.1 to perform the branch
> prediction invalidation, let's drop the previous band-aid.
> If vendors haven't updated their firmware to do SMCCC 1.1, they
> haven't updated PSCI either, so we don't loose anything.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kernel/bpi.S| 24 ---
>  arch/arm64/kernel/cpu_errata.c | 43 
> --
>  arch/arm64/kvm/hyp/switch.c| 14 --
>  3 files changed, 12 insertions(+), 69 deletions(-)
> 
> diff --git a/arch/arm64/kernel/bpi.S b/arch/arm64/kernel/bpi.S
> index fdeed629f2c6..e5de33513b5d 100644
> --- a/arch/arm64/kernel/bpi.S
> +++ b/arch/arm64/kernel/bpi.S
> @@ -54,30 +54,6 @@ ENTRY(__bp_harden_hyp_vecs_start)
>   vectors __kvm_hyp_vector
>   .endr
>  ENTRY(__bp_harden_hyp_vecs_end)
> -ENTRY(__psci_hyp_bp_inval_start)
> - sub sp, sp, #(8 * 18)
> - stp x16, x17, [sp, #(16 * 0)]
> - stp x14, x15, [sp, #(16 * 1)]
> - stp x12, x13, [sp, #(16 * 2)]
> - stp x10, x11, [sp, #(16 * 3)]
> - stp x8, x9, [sp, #(16 * 4)]
> - stp x6, x7, [sp, #(16 * 5)]
> - stp x4, x5, [sp, #(16 * 6)]
> - stp x2, x3, [sp, #(16 * 7)]
> - stp x0, x1, [sp, #(16 * 8)]
> - mov x0, #0x8400
> - smc #0
> - ldp x16, x17, [sp, #(16 * 0)]
> - ldp x14, x15, [sp, #(16 * 1)]
> - ldp x12, x13, [sp, #(16 * 2)]
> - ldp x10, x11, [sp, #(16 * 3)]
> - ldp x8, x9, [sp, #(16 * 4)]
> - ldp x6, x7, [sp, #(16 * 5)]
> - ldp x4, x5, [sp, #(16 * 6)]
> - ldp x2, x3, [sp, #(16 * 7)]
> - ldp x0, x1, [sp, #(16 * 8)]
> - add sp, sp, #(8 * 18)
> -ENTRY(__psci_hyp_bp_inval_end)
>  
>  ENTRY(__qcom_hyp_sanitize_link_stack_start)
>   stp x29, x30, [sp, #-16]!
> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index 9e77809a3b23..b8279a11f57b 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -67,7 +67,6 @@ static int cpu_enable_trap_ctr_access(void *__unused)
>  DEFINE_PER_CPU_READ_MOSTLY(struct bp_hardening_data, bp_hardening_data);
>  
>  #ifdef CONFIG_KVM
> -extern char __psci_hyp_bp_inval_start[], __psci_hyp_bp_inval_end[];
>  extern char __qcom_hyp_sanitize_link_stack_start[];
>  extern char __qcom_hyp_sanitize_link_stack_end[];
>  extern char __smccc_workaround_1_smc_start[];
> @@ -116,8 +115,6 @@ static void __install_bp_hardening_cb(bp_hardening_cb_t 
> fn,
>   spin_unlock(&bp_lock);
>  }
>  #else
> -#define __psci_hyp_bp_inval_startNULL
> -#define __psci_hyp_bp_inval_end  NULL
>  #define __qcom_hyp_sanitize_link_stack_start NULL
>  #define __qcom_hyp_sanitize_link_stack_end   NULL
>  #define __smccc_workaround_1_smc_start   NULL
> @@ -164,14 +161,15 @@ static void call_hvc_arch_workaround_1(void)
>   arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_WORKAROUND_1, NULL);
>  }
>  
> -static bool check_smccc_arch_workaround_1(const struct 
> arm64_cpu_capabilities *entry)
> +static int smccc_arch_workaround_1(void *data)
>  {
> + const struct arm64_cpu_capabilities *entry = data;
>   bp_hardening_cb_t cb;
>   void *smccc_start, *smccc_end;
>   struct arm_smccc_res res;
>  
>   if (!entry->matches(entry, SCOPE_LOCAL_CPU))

entry->matches() will be called twice in this function, another
one is in install_bp_hardening_cb() below, but install_bp_hardening_cb()
will be called in qcom_enable_link_stack_sanitization(), and
this is in the init path, so I think it's fine to keep as it is now.

> - return false;
> + return 0;
>  
>   if (psci_ops.smccc_version == SMCCC_VERSION_1_0)
>   return false;

return 0;

> @@ -181,7 +179,7 @@ static bool check_smccc_arch_workaround_1(const struct 
> arm64_cpu_capabilities *e
>   arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
> ARM_SMCCC_ARCH_WORKAROUND_1, &res);
>   if (res.a0)
> - return false;
> + return 0;
>   cb = call_hvc_arch_workaround_1;
>   smccc_start = __smccc_workaround_1_hvc_start;
>   smccc_end = __smccc_workaround_1_hvc_end;
> @@ -191,35 +189,18 @@ static bool check_smccc_arch_workaround_1(const struct 
> arm64_cpu_capabilities *e
>   arm_smccc_1_1_smc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
> ARM_SMCCC_ARCH_WORKAROUND_1, &res);
>   if (res.a0)
> - return false;
> + return 0;
>   cb = call_smc_arch_workaround_1;
>   smccc_start = __smccc_workaround_1_smc_start;
>   smccc_end = __smccc_workaround_1_smc_end;
>   break;
>  
>   def

Re: [PATCH v2 0/2] x86/apic/kexec: Legacy irq setting fixs in kdump kernel

2018-02-01 Thread Dou Liyang

Hi Baoquan,

At 01/25/2018 10:11 PM, Baoquan He wrote:

This is v2 post.

In v2, no code change, just improve change log of patch 1 and 2.

And drop the old patch 3 in v1, a clean up patch. The current
x86_io_apic_ops.disable() hook is still needed by irq remapping.

Baoquan He (2):


Test these patches in both 32bit and 64bit of x86 arches. feel free to
add my Tested-by.

Tested-by: Dou Liyang 

Thanks,
dou.


   x86/apic/kexec: Enable legacy irq mode before jump to kexec/kdump
 kernel
   x86/apic: Set up through-local-APIC on boot CPU's LINT0 if 'noapic'
 specified

  arch/x86/include/asm/io_apic.h |  3 ++-
  arch/x86/kernel/apic/apic.c|  2 +-
  arch/x86/kernel/apic/io_apic.c | 12 
  arch/x86/kernel/crash.c|  2 +-
  arch/x86/kernel/machine_kexec_32.c | 15 +--
  arch/x86/kernel/machine_kexec_64.c | 15 +--
  arch/x86/kernel/reboot.c   |  2 +-
  7 files changed, 19 insertions(+), 32 deletions(-)






[PATCH v2 07/15] MIPS: memblock: Reserve kdump/crash regions in memblock

2018-02-01 Thread Serge Semin
Kdump/crashkernel memory regions should be reserved in the
memblock allocator so they wouldn't be occupied by any further
allocations.

Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 72853e94c2c7..b2a5b89ae6b2 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -826,17 +826,15 @@ static void __init arch_mem_init(char **cmdline_p)
if (setup_elfcorehdr && setup_elfcorehdr_size) {
printk(KERN_INFO "kdump reserved memory at %lx-%lx\n",
   setup_elfcorehdr, setup_elfcorehdr_size);
-   reserve_bootmem(setup_elfcorehdr, setup_elfcorehdr_size,
-   BOOTMEM_DEFAULT);
+   memblock_reserve(setup_elfcorehdr, setup_elfcorehdr_size);
}
 #endif
 
mips_parse_crashkernel();
 #ifdef CONFIG_KEXEC
if (crashk_res.start != crashk_res.end)
-   reserve_bootmem(crashk_res.start,
-   crashk_res.end - crashk_res.start + 1,
-   BOOTMEM_DEFAULT);
+   memblock_reserve(crashk_res.start,
+crashk_res.end - crashk_res.start + 1);
 #endif
device_tree_init();
sparse_init();
-- 
2.12.0



[PATCH v2 08/15] MIPS: memblock: Mark present sparsemem sections

2018-02-01 Thread Serge Semin
If sparsemem is activated all sections with present pages must
be accordingly marked after memblock is fully initialized.

Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index b2a5b89ae6b2..54302319ce1c 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -837,6 +837,11 @@ static void __init arch_mem_init(char **cmdline_p)
 crashk_res.end - crashk_res.start + 1);
 #endif
device_tree_init();
+#ifdef CONFIG_SPARSEMEM
+   for_each_memblock(memory, reg)
+   memory_present(0, memblock_region_memory_base_pfn(reg),
+   memblock_region_memory_end_pfn(reg));
+#endif /* CONFIG_SPARSEMEM */
sparse_init();
plat_swiotlb_setup();
 
-- 
2.12.0



[PATCH v2 06/15] MIPS: memblock: Add reserved memory regions to memblock

2018-02-01 Thread Serge Semin
The memory reservation has to be performed for all the crucial
objects like kernel itself, it data and fdt blob. FDT reserved-memory
nodes should also be scanned to declare or discard reserved memory
regions, but it has to be done after the memblock is fully initialized
with low/high RAM (see the function description/code).

Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 96 +++-
 1 file changed, 54 insertions(+), 42 deletions(-)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index cf3674977170..72853e94c2c7 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -362,6 +362,10 @@ static unsigned long __init init_initrd(void)
 static void __init bootmem_init(void)
 {
init_initrd();
+}
+
+static void __init reservation_init(void)
+{
finalize_initrd();
 }
 
@@ -478,60 +482,70 @@ static void __init bootmem_init(void)
memblock_add_node(PFN_PHYS(start), PFN_PHYS(end - start), 0);
}
memblock_set_current_limit(PFN_PHYS(max_low_pfn));
+}
+
+static void __init reservation_init(void)
+{
+   phys_addr_t size;
+   int i;
 
/*
-* Register fully available low RAM pages with the bootmem allocator.
+* Reserve memory occupied by the kernel and it data
 */
-   for (i = 0; i < boot_mem_map.nr_map; i++) {
-   unsigned long start, end, size;
+   size = __pa_symbol(&_end) - __pa_symbol(&_text);
+   memblock_reserve(__pa_symbol(&_text), size);
 
-   start = PFN_UP(boot_mem_map.map[i].addr);
-   end   = PFN_DOWN(boot_mem_map.map[i].addr
-   + boot_mem_map.map[i].size);
+   /*
+* Handle FDT and it reserved-memory nodes now
+*/
+   early_init_fdt_reserve_self();
+   early_init_fdt_scan_reserved_mem();
 
-   /*
-* Reserve usable memory.
-*/
-   switch (boot_mem_map.map[i].type) {
-   case BOOT_MEM_RAM:
-   break;
-   case BOOT_MEM_INIT_RAM:
-   memory_present(0, start, end);
-   continue;
-   default:
-   /* Not usable memory */
-   if (start > min_low_pfn && end < max_low_pfn)
-   reserve_bootmem(boot_mem_map.map[i].addr,
-   boot_mem_map.map[i].size,
-   BOOTMEM_DEFAULT);
-   continue;
-   }
+   /*
+* Reserve requested memory ranges with the memblock allocator.
+*/
+   for (i = 0; i < boot_mem_map.nr_map; i++) {
+   phys_addr_t start, end;
 
-   /*
-* We are rounding up the start address of usable memory
-* and at the end of the usable range downwards.
-*/
-   if (start >= max_low_pfn)
+   if (boot_mem_map.map[i].type == BOOT_MEM_RAM)
continue;
-   if (end > max_low_pfn)
-   end = max_low_pfn;
+
+   start = boot_mem_map.map[i].addr;
+   end   = boot_mem_map.map[i].addr + boot_mem_map.map[i].size;
+   size  = boot_mem_map.map[i].size;
 
/*
-* ... finally, is the area going away?
+* Make sure the region isn't already reserved
 */
-   if (end <= start)
+   if (memblock_is_region_reserved(start, size)) {
+   pr_warn("Reserved region %08zx @ %pa already in-use\n",
+   (size_t)size, &start);
continue;
-   size = end - start;
+   }
 
-   /* Register lowmem ranges */
-   free_bootmem(PFN_PHYS(start), size << PAGE_SHIFT);
-   memory_present(0, start, end);
+   switch (boot_mem_map.map[i].type) {
+   case BOOT_MEM_ROM_DATA:
+   case BOOT_MEM_RESERVED:
+   case BOOT_MEM_INIT_RAM:
+   memblock_reserve(start, size);
+   break;
+   case BOOT_MEM_RESERVED_NOMAP:
+   default:
+   memblock_remove(start, size);
+   break;
+   }
}
 
/*
 * Reserve initrd memory if needed.
 */
finalize_initrd();
+
+   /*
+* Reserve for hibernation
+*/
+   size = __pa_symbol(&__nosave_end) - __pa_symbol(&__nosave_begin);
+   memblock_reserve(__pa_symbol(&__nosave_begin), size);
 }
 
 #endif /* CONFIG_SGI_IP27 */
@@ -546,6 +560,7 @@ static void __init bootmem_init(void)
  * kernel but generic memory management system is still entirely uninitialized.
  *
  *  o bootmem_init()
+ *  o reservation_init()
  *

[PATCH v2 09/15] MIPS: memblock: Simplify DMA contiguous reservation

2018-02-01 Thread Serge Semin
CMA reserves it areas in the memblock allocator. Since we aren't
using bootmem anymore, the reservations copying should be discarded.

Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 54302319ce1c..158a52c17e29 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -755,7 +755,7 @@ static void __init request_crashkernel(struct resource *res)
 
 static void __init arch_mem_init(char **cmdline_p)
 {
-   struct memblock_region *reg;
+   struct memblock_region *reg __maybe_unused;
extern void plat_mem_setup(void);
 
/* call board setup routine */
@@ -846,10 +846,6 @@ static void __init arch_mem_init(char **cmdline_p)
plat_swiotlb_setup();
 
dma_contiguous_reserve(PFN_PHYS(max_low_pfn));
-   /* Tell bootmem about cma reserved memblock section */
-   for_each_memblock(reserved, reg)
-   if (reg->size != 0)
-   reserve_bootmem(reg->base, reg->size, BOOTMEM_DEFAULT);
 }
 
 static void __init resource_init(void)
-- 
2.12.0



[PATCH v2 11/15] MIPS: memblock: Perform early low memory test

2018-02-01 Thread Serge Semin
Low memory can be tested at this point, since all the
reservations have just been finished without much of
additional allocations.

Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 531a1471a2a4..a0eac8160750 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -850,6 +850,8 @@ static void __init arch_mem_init(char **cmdline_p)
memblock_allow_resize();
 
memblock_dump_all();
+
+   early_memtest(PFN_PHYS(min_low_pfn), PFN_PHYS(max_low_pfn));
 }
 
 static void __init resource_init(void)
-- 
2.12.0



[PATCH v2 12/15] MIPS: memblock: Print out kernel virtual mem layout

2018-02-01 Thread Serge Semin
It is useful to have the kernel virtual memory layout printed
at boot time so to have the full information about the booted
kernel. In some cases it might be unsafe to have virtual
addresses freely visible in logs, so the %pK format is used if
one want to hide them.

Signed-off-by: Serge Semin 
---
 arch/mips/mm/init.c | 49 +
 1 file changed, 49 insertions(+)

diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
index 84b7b592b834..eec92194d4dc 100644
--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -60,6 +61,53 @@ EXPORT_SYMBOL_GPL(empty_zero_page);
 EXPORT_SYMBOL(zero_page_mask);
 
 /*
+ * Print out the kernel virtual memory layout
+ */
+#define MLK(b, t) (void *)b, (void *)t, ((t) - (b)) >> 10
+#define MLM(b, t) (void *)b, (void *)t, ((t) - (b)) >> 20
+#define MLK_ROUNDUP(b, t) (void *)b, (void *)t, DIV_ROUND_UP(((t) - (b)), 
SZ_1K)
+static void __init mem_print_kmap_info(void)
+{
+#ifdef CONFIG_DEBUG_KERNEL
+   pr_notice("Kernel virtual memory layout:\n"
+ "lowmem  : 0x%px - 0x%px  (%4ld MB)\n"
+ "  .text : 0x%px - 0x%px  (%4td kB)\n"
+ "  .data : 0x%px - 0x%px  (%4td kB)\n"
+ "  .init : 0x%px - 0x%px  (%4td kB)\n"
+ "  .bss  : 0x%px - 0x%px  (%4td kB)\n"
+ "vmalloc : 0x%px - 0x%px  (%4ld MB)\n"
+#ifdef CONFIG_HIGHMEM
+ "pkmap   : 0x%px - 0x%px  (%4ld MB)\n"
+#endif
+ "fixmap  : 0x%px - 0x%px  (%4ld kB)\n",
+ MLM(PAGE_OFFSET, (unsigned long)high_memory),
+ MLK_ROUNDUP(_text, _etext),
+ MLK_ROUNDUP(_sdata, _edata),
+ MLK_ROUNDUP(__init_begin, __init_end),
+ MLK_ROUNDUP(__bss_start, __bss_stop),
+ MLM(VMALLOC_START, VMALLOC_END),
+#ifdef CONFIG_HIGHMEM
+ MLM(PKMAP_BASE, (PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE)),
+#endif
+ MLK(FIXADDR_START, FIXADDR_TOP));
+
+   /* Check some fundamental inconsistencies. May add something else? */
+#ifdef CONFIG_HIGHMEM
+   BUILD_BUG_ON(VMALLOC_END < PAGE_OFFSET);
+   BUG_ON(VMALLOC_END < (unsigned long)high_memory);
+   BUILD_BUG_ON((PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE) < PAGE_OFFSET);
+   BUG_ON((PKMAP_BASE) + (LAST_PKMAP)*(PAGE_SIZE) <
+   (unsigned long)high_memory);
+#endif
+   BUILD_BUG_ON(FIXADDR_TOP < PAGE_OFFSET);
+   BUG_ON(FIXADDR_TOP < (unsigned long)high_memory);
+#endif /* CONFIG_DEBUG_KERNEL */
+}
+#undef MLK
+#undef MLM
+#undef MLK_ROUNDUP
+
+/*
  * Not static inline because used by IP27 special magic initialization code
  */
 void setup_zero_pages(void)
@@ -468,6 +516,7 @@ void __init mem_init(void)
free_all_bootmem();
setup_zero_pages(); /* Setup zeroed pages.  */
mem_init_free_highmem();
+   mem_print_kmap_info();
mem_init_print_info(NULL);
 
 #ifdef CONFIG_64BIT
-- 
2.12.0



[PATCH v2 10/15] MIPS: memblock: Allow memblock regions resize

2018-02-01 Thread Serge Semin
When all the main reservations are done the memblock regions
can be dynamically resized. Additionally it would be useful to have
memblock regions dumped on debug at this point.

Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 158a52c17e29..531a1471a2a4 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -846,6 +846,10 @@ static void __init arch_mem_init(char **cmdline_p)
plat_swiotlb_setup();
 
dma_contiguous_reserve(PFN_PHYS(max_low_pfn));
+
+   memblock_allow_resize();
+
+   memblock_dump_all();
 }
 
 static void __init resource_init(void)
-- 
2.12.0



[PATCH v2 13/15] MIPS: memblock: Discard bootmem from Loongson3 code

2018-02-01 Thread Serge Semin
Loongson64/3 runs its own code to initialize memory allocator in
case of NUMA configuration is selected. So in order to move to the
pure memblock utilization we discard the bootmem allocator usage
and insert the memblock reservation method for kernel/addrspace_offset
memory regions.

Signed-off-by: Serge Semin 
---
 arch/mips/loongson64/loongson-3/numa.c | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/arch/mips/loongson64/loongson-3/numa.c 
b/arch/mips/loongson64/loongson-3/numa.c
index f17ef520799a..2f1ebf496c17 100644
--- a/arch/mips/loongson64/loongson-3/numa.c
+++ b/arch/mips/loongson64/loongson-3/numa.c
@@ -180,7 +180,6 @@ static void __init szmem(unsigned int node)
 
 static void __init node_mem_init(unsigned int node)
 {
-   unsigned long bootmap_size;
unsigned long node_addrspace_offset;
unsigned long start_pfn, end_pfn, freepfn;
 
@@ -197,26 +196,21 @@ static void __init node_mem_init(unsigned int node)
 
__node_data[node] = prealloc__node_data + node;
 
-   NODE_DATA(node)->bdata = &bootmem_node_data[node];
NODE_DATA(node)->node_start_pfn = start_pfn;
NODE_DATA(node)->node_spanned_pages = end_pfn - start_pfn;
 
-   bootmap_size = init_bootmem_node(NODE_DATA(node), freepfn,
-   start_pfn, end_pfn);
free_bootmem_with_active_regions(node, end_pfn);
if (node == 0) /* used by finalize_initrd() */
max_low_pfn = end_pfn;
 
-   /* This is reserved for the kernel and bdata->node_bootmem_map */
-   reserve_bootmem_node(NODE_DATA(node), start_pfn << PAGE_SHIFT,
-   ((freepfn - start_pfn) << PAGE_SHIFT) + bootmap_size,
-   BOOTMEM_DEFAULT);
+   /* This is reserved for the kernel only */
+   if (node == 0)
+   memblock_reserve(start_pfn << PAGE_SHIFT,
+   ((freepfn - start_pfn) << PAGE_SHIFT));
 
if (node == 0 && node_end_pfn(0) >= (0x >> PAGE_SHIFT)) {
/* Reserve 0xfe00~0x for RS780E integrated GPU */
-   reserve_bootmem_node(NODE_DATA(node),
-   (node_addrspace_offset | 0xfe00),
-   32 << 20, BOOTMEM_DEFAULT);
+   memblock_reserve(node_addrspace_offset | 0xfe00, 32 << 20);
}
 
sparse_memory_present_with_active_regions(node);
-- 
2.12.0



[PATCH v2 15/15] MIPS: memblock: Deactivate bootmem allocator

2018-02-01 Thread Serge Semin
Memblock allocator can be successfully used from now for early
memory management

Signed-off-by: Serge Semin 
---
 arch/mips/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 350a990fc719..434f756e03e9 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -4,7 +4,6 @@ config MIPS
default y
select ARCH_BINFMT_ELF_STATE
select ARCH_CLOCKSOURCE_DATA
-   select ARCH_DISCARD_MEMBLOCK
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_MIGHT_HAVE_PC_PARPORT
@@ -57,6 +56,7 @@ config MIPS
select HAVE_IRQ_TIME_ACCOUNTING
select HAVE_KPROBES
select HAVE_KRETPROBES
+   select NO_BOOTMEM
select HAVE_MEMBLOCK
select HAVE_MEMBLOCK_NODE_MAP
select HAVE_MOD_ARCH_SPECIFIC
-- 
2.12.0



[PATCH v2 05/15] MIPS: KASLR: Drop relocatable fixup from reservation_init

2018-02-01 Thread Serge Semin
From: Matt Redfearn 

A recent change ("MIPS: memblock: Discard bootmem initialization")
removed the reservation of all memory below the kernel's _end symbol in
bootmem. This makes the call to free_bootmem unnecessary, since the
memory region is no longer marked reserved.

Additionally, ("MIPS: memblock: Print out kernel virtual mem
layout") added a display of the kernel's virtual memory layout, so
printing the relocation information at this point is redundant.

Remove this section of code.

Signed-off-by: Matt Redfearn 
---
 arch/mips/kernel/setup.c | 23 ---
 1 file changed, 23 deletions(-)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index b5fcacf71b3f..cf3674977170 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -528,29 +528,6 @@ static void __init bootmem_init(void)
memory_present(0, start, end);
}
 
-#ifdef CONFIG_RELOCATABLE
-   /*
-* The kernel reserves all memory below its _end symbol as bootmem,
-* but the kernel may now be at a much higher address. The memory
-* between the original and new locations may be returned to the system.
-*/
-   if (__pa_symbol(_text) > __pa_symbol(VMLINUX_LOAD_ADDRESS)) {
-   unsigned long offset;
-   extern void show_kernel_relocation(const char *level);
-
-   offset = __pa_symbol(_text) - __pa_symbol(VMLINUX_LOAD_ADDRESS);
-   free_bootmem(__pa_symbol(VMLINUX_LOAD_ADDRESS), offset);
-
-#if defined(CONFIG_DEBUG_KERNEL) && defined(CONFIG_DEBUG_INFO)
-   /*
-* This information is necessary when debugging the kernel
-* But is a security vulnerability otherwise!
-*/
-   show_kernel_relocation(KERN_INFO);
-#endif
-   }
-#endif
-
/*
 * Reserve initrd memory if needed.
 */
-- 
2.12.0



[PATCH v2 03/15] MIPS: memblock: Reserve initrd memory in memblock

2018-02-01 Thread Serge Semin
There is no reserve_bootmem() method in the nobootmem interface,
so we need to replace it with memblock-specific one.

Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index f502cd702fa7..a015cee353be 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -330,7 +330,7 @@ static void __init finalize_initrd(void)
 
maybe_bswap_initrd();
 
-   reserve_bootmem(__pa(initrd_start), size, BOOTMEM_DEFAULT);
+   memblock_reserve(__pa(initrd_start), size);
initrd_below_start_ok = 1;
 
pr_info("Initial ramdisk at: 0x%lx (%lu bytes)\n",
-- 
2.12.0



[PATCH v2 14/15] MIPS: memblock: Discard bootmem from SGI IP27 code

2018-02-01 Thread Serge Semin
SGI IP27 got its own code to set the early memory allocator up since it's
NUMA-based system. So in order to be compatible with NO_BOOTMEM config
we need to discard the bootmem allocator initialization and insert the
memblock reservation method. Although in my opinion the code isn't
working anyway since I couldn't find a place where prom_meminit() called
and kernel memory isn't reserved. It must have been untested since the
time the arch/mips/mips-boards/generic code was in the kernel.

Signed-off-by: Serge Semin 
---
 arch/mips/sgi-ip27/ip27-memory.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/mips/sgi-ip27/ip27-memory.c b/arch/mips/sgi-ip27/ip27-memory.c
index 59133d0abc83..c480ee3eca96 100644
--- a/arch/mips/sgi-ip27/ip27-memory.c
+++ b/arch/mips/sgi-ip27/ip27-memory.c
@@ -389,7 +389,6 @@ static void __init node_mem_init(cnodeid_t node)
 {
unsigned long slot_firstpfn = slot_getbasepfn(node, 0);
unsigned long slot_freepfn = node_getfirstfree(node);
-   unsigned long bootmap_size;
unsigned long start_pfn, end_pfn;
 
get_pfn_range_for_nid(node, &start_pfn, &end_pfn);
@@ -400,7 +399,6 @@ static void __init node_mem_init(cnodeid_t node)
__node_data[node] = __va(slot_freepfn << PAGE_SHIFT);
memset(__node_data[node], 0, PAGE_SIZE);
 
-   NODE_DATA(node)->bdata = &bootmem_node_data[node];
NODE_DATA(node)->node_start_pfn = start_pfn;
NODE_DATA(node)->node_spanned_pages = end_pfn - start_pfn;
 
@@ -409,12 +407,9 @@ static void __init node_mem_init(cnodeid_t node)
slot_freepfn += PFN_UP(sizeof(struct pglist_data) +
   sizeof(struct hub_data));
 
-   bootmap_size = init_bootmem_node(NODE_DATA(node), slot_freepfn,
-   start_pfn, end_pfn);
free_bootmem_with_active_regions(node, end_pfn);
-   reserve_bootmem_node(NODE_DATA(node), slot_firstpfn << PAGE_SHIFT,
-   ((slot_freepfn - slot_firstpfn) << PAGE_SHIFT) + bootmap_size,
-   BOOTMEM_DEFAULT);
+   memblock_reserve(slot_firstpfn << PAGE_SHIFT,
+   ((slot_freepfn - slot_firstpfn) << PAGE_SHIFT));
sparse_memory_present_with_active_regions(node);
 }
 
-- 
2.12.0



[PATCH v2 04/15] MIPS: memblock: Discard bootmem initialization

2018-02-01 Thread Serge Semin
Since memblock is going to be used for the early memory allocation
lets discard the bootmem node setup and all the related free-space
search code. Low/high PFN extremums should be still calculated
since they are needed on the paging_init stage. Since the current
code is already doing memblock regions initialization the only thing
left is to set the upper allocation limit to be up to the max low
memory PFN, so the memblock API can be fully used from now.

Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 86 +++-
 1 file changed, 11 insertions(+), 75 deletions(-)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index a015cee353be..b5fcacf71b3f 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -367,29 +367,15 @@ static void __init bootmem_init(void)
 
 #else  /* !CONFIG_SGI_IP27 */
 
-static unsigned long __init bootmap_bytes(unsigned long pages)
-{
-   unsigned long bytes = DIV_ROUND_UP(pages, 8);
-
-   return ALIGN(bytes, sizeof(long));
-}
-
 static void __init bootmem_init(void)
 {
-   unsigned long reserved_end;
-   unsigned long mapstart = ~0UL;
-   unsigned long bootmap_size;
-   bool bootmap_valid = false;
int i;
 
/*
-* Sanity check any INITRD first. We don't take it into account
-* for bootmem setup initially, rely on the end-of-kernel-code
-* as our memory range starting point. Once bootmem is inited we
+* Sanity check any INITRD first. Once memblock is inited we
 * will reserve the area used for the initrd.
 */
init_initrd();
-   reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end));
 
/*
 * max_low_pfn is not a number of pages. The number of pages
@@ -428,16 +414,6 @@ static void __init bootmem_init(void)
max_low_pfn = end;
if (start < min_low_pfn)
min_low_pfn = start;
-   if (end <= reserved_end)
-   continue;
-#ifdef CONFIG_BLK_DEV_INITRD
-   /* Skip zones before initrd and initrd itself */
-   if (initrd_end && end <= (unsigned 
long)PFN_UP(__pa(initrd_end)))
-   continue;
-#endif
-   if (start >= mapstart)
-   continue;
-   mapstart = max(reserved_end, start);
}
 
if (min_low_pfn >= max_low_pfn)
@@ -463,53 +439,19 @@ static void __init bootmem_init(void)
 #endif
max_low_pfn = PFN_DOWN(HIGHMEM_START);
}
-
-#ifdef CONFIG_BLK_DEV_INITRD
-   /*
-* mapstart should be after initrd_end
-*/
-   if (initrd_end)
-   mapstart = max(mapstart, (unsigned 
long)PFN_UP(__pa(initrd_end)));
+#ifdef CONFIG_HIGHMEM
+   pr_info("PFNs: low min %lu, low max %lu, high start %lu, high end %lu,"
+   "max %lu\n",
+   min_low_pfn, max_low_pfn, highstart_pfn, highend_pfn, max_pfn);
+#else
+   pr_info("PFNs: low min %lu, low max %lu, max %lu\n",
+   min_low_pfn, max_low_pfn, max_pfn);
 #endif
 
/*
-* check that mapstart doesn't overlap with any of
-* memory regions that have been reserved through eg. DTB
-*/
-   bootmap_size = bootmap_bytes(max_low_pfn - min_low_pfn);
-
-   bootmap_valid = memory_region_available(PFN_PHYS(mapstart),
-   bootmap_size);
-   for (i = 0; i < boot_mem_map.nr_map && !bootmap_valid; i++) {
-   unsigned long mapstart_addr;
-
-   switch (boot_mem_map.map[i].type) {
-   case BOOT_MEM_RESERVED:
-   mapstart_addr = PFN_ALIGN(boot_mem_map.map[i].addr +
-   boot_mem_map.map[i].size);
-   if (PHYS_PFN(mapstart_addr) < mapstart)
-   break;
-
-   bootmap_valid = memory_region_available(mapstart_addr,
-   bootmap_size);
-   if (bootmap_valid)
-   mapstart = PHYS_PFN(mapstart_addr);
-   break;
-   default:
-   break;
-   }
-   }
-
-   if (!bootmap_valid)
-   panic("No memory area to place a bootmap bitmap");
-
-   /*
-* Initialize the boot-time allocator with low memory only.
+* Initialize the boot-time allocator with low/high memory, but
+* set the allocation limit to low memory only
 */
-   if (bootmap_size != init_bootmem_node(NODE_DATA(0), mapstart,
-min_low_pfn, max_low_pfn))
-   panic("Unexpected memory size required for bootmap");
-
for (i = 0; i < boot_mem_map.nr_map; i++) {
unsigned long start, end;
 
@@ -535,6 +477,7 @@ static void __init

[PATCH v2 02/15] MIPS: memblock: Surely map BSS kernel memory section

2018-02-01 Thread Serge Semin
The current MIPS code makes sure the kernel code/data/init
sections are in the maps, but BSS should also be there.

Signed-off-by: Serge Semin 
---
 arch/mips/kernel/setup.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 1a4d64410303..f502cd702fa7 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -845,6 +845,9 @@ static void __init arch_mem_init(char **cmdline_p)
arch_mem_addpart(PFN_UP(__pa_symbol(&__init_begin)) << PAGE_SHIFT,
 PFN_DOWN(__pa_symbol(&__init_end)) << PAGE_SHIFT,
 BOOT_MEM_INIT_RAM);
+   arch_mem_addpart(PFN_DOWN(__pa_symbol(&__bss_start)) << PAGE_SHIFT,
+PFN_UP(__pa_symbol(&__bss_stop)) << PAGE_SHIFT,
+BOOT_MEM_RAM);
 
pr_info("Determined physical RAM map:\n");
print_memory_map();
-- 
2.12.0



[PATCH v2 01/15] MIPS: memblock: Add RESERVED_NOMAP memory flag

2018-02-01 Thread Serge Semin
Even if nomap flag is specified the reserved memory declared in dts
isn't really discarded from the buddy allocator in the current code.
We'll fix it by adding the no-map MIPS memory flag. Additionally
lets add the RESERVED_NOMAP memory regions handling to the methods,
which aren't going to be changed in the further patches.

Signed-off-by: Serge Semin 
---
 arch/mips/include/asm/bootinfo.h | 1 +
 arch/mips/kernel/prom.c  | 8 ++--
 arch/mips/kernel/setup.c | 8 
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/mips/include/asm/bootinfo.h b/arch/mips/include/asm/bootinfo.h
index e26a093bb17a..276618b5319d 100644
--- a/arch/mips/include/asm/bootinfo.h
+++ b/arch/mips/include/asm/bootinfo.h
@@ -90,6 +90,7 @@ extern unsigned long mips_machtype;
 #define BOOT_MEM_ROM_DATA  2
 #define BOOT_MEM_RESERVED  3
 #define BOOT_MEM_INIT_RAM  4
+#define BOOT_MEM_RESERVED_NOMAP5
 
 /*
  * A memory map that's built upon what was determined
diff --git a/arch/mips/kernel/prom.c b/arch/mips/kernel/prom.c
index 0dbcd152a1a9..b123eb8279cd 100644
--- a/arch/mips/kernel/prom.c
+++ b/arch/mips/kernel/prom.c
@@ -41,7 +41,7 @@ char *mips_get_machine_name(void)
 #ifdef CONFIG_USE_OF
 void __init early_init_dt_add_memory_arch(u64 base, u64 size)
 {
-   return add_memory_region(base, size, BOOT_MEM_RAM);
+   add_memory_region(base, size, BOOT_MEM_RAM);
 }
 
 void * __init early_init_dt_alloc_memory_arch(u64 size, u64 align)
@@ -52,7 +52,11 @@ void * __init early_init_dt_alloc_memory_arch(u64 size, u64 
align)
 int __init early_init_dt_reserve_memory_arch(phys_addr_t base,
phys_addr_t size, bool nomap)
 {
-   add_memory_region(base, size, BOOT_MEM_RESERVED);
+   if (!nomap)
+   add_memory_region(base, size, BOOT_MEM_RESERVED);
+   else
+   add_memory_region(base, size, BOOT_MEM_RESERVED_NOMAP);
+
return 0;
 }
 
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 702c678de116..1a4d64410303 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -172,6 +172,7 @@ bool __init memory_region_available(phys_addr_t start, 
phys_addr_t size)
in_ram = true;
break;
case BOOT_MEM_RESERVED:
+   case BOOT_MEM_RESERVED_NOMAP:
if ((start >= start_ && start < end_) ||
(start < start_ && start + size >= start_))
free = false;
@@ -207,6 +208,9 @@ static void __init print_memory_map(void)
case BOOT_MEM_RESERVED:
printk(KERN_CONT "(reserved)\n");
break;
+   case BOOT_MEM_RESERVED_NOMAP:
+   printk(KERN_CONT "(reserved nomap)\n");
+   break;
default:
printk(KERN_CONT "type %lu\n", 
boot_mem_map.map[i].type);
break;
@@ -955,9 +959,13 @@ static void __init resource_init(void)
res->name = "System RAM";
res->flags |= IORESOURCE_SYSRAM;
break;
+   case BOOT_MEM_RESERVED_NOMAP:
+   res->name = "reserved nomap";
+   break;
case BOOT_MEM_RESERVED:
default:
res->name = "reserved";
+   break;
}
 
request_resource(&iomem_resource, res);
-- 
2.12.0



[PATCH v2 00/15] MIPS: memblock: Switch arch code to NO_BOOTMEM

2018-02-01 Thread Serge Semin
Even though it's common to see the architecture code using both
bootmem and memblock early memory allocators, it's not good for
multiple reasons. First of all, it's redundant to have two
early memory allocator while one would be more than enough from
functionality and stability points of view. Secondly, some new
features introduced in the kernel utilize the methods of the most
modern allocator ignoring the older one. It means the architecture
code must keep the both subsystems up synchronized with information
about memory regions and reservations, which leads to the code
complexity increase, that obviously increases bugs probability.
Finally it's better to keep all the architectures code unified for
better readability and code simplification. All these reasons lead
to one conclusion - arch code should use just one memory allocator,
which is supposed to be memblock as the most modern and already
utilized by the most of the kernel platforms. This patchset is
mostly about it.

One more reason why the MIPS arch code should finally move to
memblock is a BUG somewhere in the initialization process, when
CMA is activated:

[0.248762] BUG: Bad page state in process swapper/0  pfn:01f93
[0.255415] page:8205b0ac count:0 mapcount:-127 mapping:  (null) index:0x1
[0.263172] flags: 0x4000()
[0.266723] page dumped because: nonzero mapcount
[0.272049] Modules linked in:
[0.275511] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.4.88-module #5
[0.282900] Stack :   80b6dd6a 003a   
8093 8092bff4
  86073a14 80ac88c7 809f21ac  0001 80b6998c 0400 

  80a0 801822e8 80b6dd68  0002  809f8024 
86077ccc
  80b8 801e9328 809fcbc0  0400 0001 86077ccc 
86073a14
         

  ...
[0.323148] Call Trace:
[0.325935] [<8010e7c4>] show_stack+0x8c/0xa8
[0.330859] [<80404814>] dump_stack+0xd4/0x110
[0.335879] [<801f0bc0>] bad_page+0xfc/0x14c
[0.340710] [<801f0e04>] free_pages_prepare+0x1f4/0x330
[0.346632] [<801f36c4>] __free_pages_ok+0x2c/0x104
[0.352154] [<80b23a40>] init_cma_reserved_pageblock+0x5c/0x74
[0.358761] [<80b29390>] cma_init_reserved_areas+0x1b4/0x240
[0.365170] [<8010058c>] do_one_initcall+0xe8/0x27c
[0.370697] [<80b14e60>] kernel_init_freeable+0x200/0x2c4
[0.376828] [<808faca4>] kernel_init+0x14/0x104
[0.381939] [<80107598>] ret_from_kernel_thread+0x14/0x1c

The bugus pfn seems to be the one allocated for bootmem allocator
pages and hasn't been freed before letting the CMA working with its
areas. Anyway the bug is solved by this patchset.

Another reason why this patchset is useful is that it fixes the fdt
reserved-memory nodes functionality for MIPS. Really it's bug to have
the fdt reserved nodes scanning before the memblock is
fully initialized (calling early_init_fdt_scan_reserved_mem before
bootmem_init is called). Additionally no-map flag of the
reserved-memory node hasn't been taking into account. This patchset
fixes all of these.

As you probably remember I already did another attempt to merge a
similar functionality into the kernel. This time the patchset got
to be less complex (14 patches vs 21 last time) and fixes the
platform code like SGI IP27 and Loongson3, which due to being
NUMA introduce its own memory initialization process. Although
I have much doubt in SGI IP27 code operability in the first place,
since it got prom_meminit() method of early memory initialization,
which hasn't been called at any other place in the kernel. It must
have been left there unrenamed after arch/mips/mips-boards/generic
code had been discarded.

Here are the list of folks, who agreed to perform some tests of
the patchset:
Alexander Sverdlin  - Octeon2
Matt Redfearn  - Loongson3, etc
Joshua Kinard  - IP27
Marcin Nowakowski 
Thanks to you all and to everybody, who will be involved in reviewing
and testing.

The patchset is applied on top of kernel 4.15-rc8 and can be found
submitted at my repo:
https://github.com/fancer/Linux-kernel-MIPS-memblock-project

So far the patchset has been successfully tested on the platforms:
UTM8 (Cavium Octeon III)
Creator CI20
Creator CI40
Loongson3a
MIPS Boston
MIPS Malta
MIPS SEAD3
Octeon2

Changelog v2:
- Hide mem_print_kmap_info() behind CONFIG_DEBUG_KERNEL and replace
  %pK with %px there (requested by Matt Redfearn)
- Drop relocatable fixup from reservation_init (patch from Matt Redfearn)
- Move __maybe_unused change from patch 7 to patch 8 (requested by Marcin 
Nowakowski)
- Add tested platforms to the cover letter

Signed-off-by: Serge Semin 
Tested-by: Matt Redfearn 
Tested-by: Alexander Sverdlin 

Matt Redfearn (1):
  MIPS: KASLR: Drop relocatable fixup from reservation_init

Serge Semin (14):
  MIPS: memblock: Add RESERVED_NOMAP memory flag
  MIPS: memblock: Surely map BSS kernel memory section
  MIPS: memblock: 

[PATCH] checkpatch: Improve OPEN_BRACE test

2018-02-01 Thread Joe Perches
Some structure definitions that use macros trip the OPEN_BRACE test.

e.g. +struct bpf_map_def SEC("maps") control_map = {

Improve the test by using $balanced_parens instead of a .*

Miscellanea:

o Use $sline so any comments are ignored
o Correct the message output from declaration to definition
o Remove unnecessary parentheses

Signed-off-by: Joe Perches 
Reported-by: Song Liu 
---
 scripts/checkpatch.pl | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 78e7a310af46..3d4040322ae1 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3919,10 +3919,12 @@ sub process {
 
 # function brace can't be on same line, except for #defines of do while,
 # or if closed on same line
-   if (($line=~/$Type\s*$Ident\(.*\).*\s*{/) and
-   !($line=~/\#\s*define.*do\s\{/) and !($line=~/}/)) {
+   if ($^V && $^V ge 5.10.0 &&
+   $sline =~ /$Type\s*$Ident\s*$balanced_parens\s*\{/ &&
+   $sline !~ /\#\s*define\b.*do\s*\{/ &&
+   $sline !~ /}/) {
if (ERROR("OPEN_BRACE",
- "open brace '{' following function 
declarations go on the next line\n" . $herecurr) &&
+ "open brace '{' following function 
definitions go on the next line\n" . $herecurr) &&
$fix) {
fix_delete_line($fixlinenr, $rawline);
my $fixed_line = $rawline;
-- 
2.15.0



Re: [PATCH] of: cache phandle nodes to decrease cost of of_find_node_by_phandle()

2018-02-01 Thread Rob Herring
On Thu, Feb 1, 2018 at 3:09 PM, Frank Rowand  wrote:
> On 02/01/18 06:24, Rob Herring wrote:
>> On Wed, Jan 31, 2018 at 3:43 PM, Frank Rowand  wrote:
>>> On 01/31/18 12:05, frowand.l...@gmail.com wrote:
 From: Frank Rowand 

 Create a cache of the nodes that contain a phandle property.  Use this
 cache to find the node for a given phandle value instead of scanning
 the devicetree to find the node.  If the phandle value is not found
 in the cache, of_find_node_by_phandle() will fall back to the tree
 scan algorithm.

 The cache is initialized in of_core_init().

 The cache is freed via a late_initcall_sync().

 Signed-off-by: Frank Rowand 
 ---

 Some of_find_by_phandle() calls may occur before the cache is
 initialized or after it is freed.  For example, for the qualcomm
 qcom-apq8074-dragonboard, 11 calls occur before the initialization
 and 80 occur after the cache is freed (out of 516 total calls.)


  drivers/of/base.c   | 85 
 ++---
  drivers/of/of_private.h |  5 +++
  drivers/of/resolver.c   | 21 
  3 files changed, 86 insertions(+), 25 deletions(-)
>>>
>>> Some observations
>>>
>>> The size of the cache for a normal device tree would be a couple of
>>> words of overhead for the cache, plus one pointer per devicetree node
>>> that contains a phandle property.  This will be less space than
>>> would be used by adding a hash field to each device node.  It is
>>> also less space than was used by the older algorithm (long gone)
>>> that added a linked list through the nodes that contained a
>>> phandle property.
>>>
>>> This is assuming that the values of the phandle properties are
>>> the default ones created by the dtc compiler.  In the case
>>> where a very large phandle property value is hand-coded in
>>> a devicetree source, the size of the cache is capped at one
>>> entry per node.  In this case, a little bit of space will be
>>> wasted -- but this is just a sanity fallback, it should not
>>> be encountered, and can be fixed by fixing the devicetree
>>> source.
>>
>> I don't think we should rely on how dtc allocates phandles. dtc is not
>> the only source of DeviceTrees. If we could do that, then lets make
>
> It seems like a reasonable thing to rely on.  dtc is the in-tree
> compiler to create an FDT.
>
> Are you thinking about the IBM PPC devicetrees as devicetrees created
> in some manner other than dtc?  Are there other examples you are
> aware of?

There's that and any other platform with real OF. There's also the BSD
implementation of dtc.

> If non-dtc tools create phandle property values that are not a
> contiguous range of values starting with one, then the devicetrees
> they create may not benefit from this performance optimization.
> But no user of such a devicetree is complaining about performance
> issues with of_find_node_by_phandle() against their tree.  So until
> there is an issue, no big deal.

All I'm really saying is mask the low bits like I did. Then it works
equally well for any continuous range. Yes, someone could allocate in
multiples of 1024 or something and it wouldn't work well (still works,
but misses). Then we're really only debating dynamically sizing it and
whether to free it.

> If my effort to create a new version of the FDT, I would like to
> include a rule to the effect of "phandle property values created
> by the compiler _should_ be in the range of 1..n, where n is the
> number of phandle properties in the tree".  That would provide
> some assurance of future trees being able to benefit from this
> specific optimization.

Did you think of that before this issue? :)

> Also, this specific implementation to decrease the cost of
> of_find_node_by_phandle() is just an implementation, not an
> architecture.  Other implementations to achieve the same goal
> have existed in the past, and yet other methods could replace
> this one in the future if needed.
>
>
>> them have some known flag in the upper byte so we have some hint for
>> phandle values. 2^24 phandles should be enough for anyone.TM
>
> I don't understand.  What is the definition of the flag?  A flag
> that says the phandle property values are in the range of 1..n,
> where n is the number of phandle properties in the tree?

If we defined that phandles have values of say "0xABxx", then we
could use that for parsing properties without looking up #*-cells.
Maybe you encode the cell count too. Yes, you'd have to handle
possible collisions, but it would be better than nothing. My point is
that we don't do this because then we'd be making assumptions on
phandle values. We can't make assumptions because the dtbs already
exist and dtc is only one of the things generating phandles I can
change.

>> Your cache size is also going to balloon if the dtb was built with
>> '-@'.
>
> "Balloon" is a bit strong.  Worst case is one entry per node,
> which is c

Re: [PATCH v2 16/16] arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support

2018-02-01 Thread Hanjun Guo
On 2018/2/1 16:53, Marc Zyngier wrote:
[...]
 ... and actually, perhaps it makes sense for the
 SMCCC_ARCH_WORKAROUND_1 check to be completely independent of MIDR
 based errata matching?

 I.e., if SMCCC v1.1 and SMCCC_ARCH_WORKAROUND_1 are both implemented,
 we should probably invoke it even if the MIDR is not known to belong
 to an affected implementation.
>>>
>>> This would have an impact on big-little systems, for which there is
>>> often a bunch of unaffected CPUs.
>>
>> I think it's what we are doing now, SMCCC v1.1 didn't provide the ability
>> to report per-cpu SMCCC_ARCH_WORKAROUND_1, and it said:
>>  - The discovery call must return the same result on all PEs in the system.
>>  - In heterogeneous systems with some PEs that require mitigation and others
>>that do not, the firmware must provide a safe implementation of this
>>function on all PEs.
>>
>> So from the spec that it's the firmware to take care of unaffected CPUs,
>> to the kernel it's the same.
> 
> The spec makes it safe. The MIDR list makes it fast.

Got it, thank you for clarifying this.

Thanks
Hanjun



Re: [PATCH V4 2/2] ARM: dts: imx7s: add snvs rtc clock

2018-02-01 Thread Shawn Guo
On Tue, Jan 09, 2018 at 05:52:06PM +0800, Anson Huang wrote:
> Add i.MX7 SNVS RTC clock.
> 
> Signed-off-by: Anson Huang 

Looks fine to me.  Ping me when clk driver part lands mainline.

Shawn


  1   2   3   4   5   6   7   8   >