date:20181101

Re: [PATCH 0/5] Implement devm_of_clk_add_provider

2018-11-01 Thread Ricardo Ribalda Delgado

Hi Stephen
On Fri, Nov 2, 2018 at 12:35 AM Stephen Boyd  wrote:
>
> Quoting Ricardo Ribalda Delgado (2018-11-01 07:40:39)
> > All Tull reported that there might be a great ammount of drivers with
> > imbalance on clk_add_provider. This is an issue for Device tree overlays
> > (and also a bug) https://lkml.org/lkml/2018/10/18/1103
> >
> > This patchset implement a devm_ function of of_clk_add_provider, and
> > fixes 3 drivers.
> >
> > Drivers like clk-gpio will be easily fixed with coccinelle if this set
> > is accepted. (I volunteer, I want to learn how to use it, just seen the
> > great presentations from Julia).
>
> We already have devm_of_clk_add_hw_provider(), so any instances of
> of_clk_add_provider() should be replaced with that, instead of
> propagating the usage of of_clk_add_provider() any further. I'll gladly
> apply patches to convert drivers from struct clk based APIs to struct
> clk_hw based APIs so that we can clearly split clk providers from clk
> consumers. So if you're interested in working on some coccinelle script
> for that it would be great!
>

Will look into that.
Can you take a look to 1/5 of this patchset? I believe that it is
valid even if we do not take 2-5.

Cheers

-- 
Ricardo Ribalda

Re: s390: runtime warning about pgtables_bytes

2018-11-01 Thread Martin Schwidefsky

On Wed, 31 Oct 2018 15:57:54 -0400
Joe Lawrence  wrote:

> On Fri, Oct 12, 2018 at 05:08:33PM +0200, Martin Schwidefsky wrote:
> > On Thu, 11 Oct 2018 15:02:11 +0200
> > Martin Schwidefsky  wrote:
> >   
> > > On Thu, 11 Oct 2018 18:04:12 +0800
> > > Li Wang  wrote:
> > >   
> > > > When running s390 system with LTP/cve-2017-17052.c[1], the following 
> > > > BUG is
> > > > came out repeatedly.
> > > > I remember this warning start from kernel-4.16.0 and now it still exist 
> > > > in
> > > > kernel-4.19-rc7.
> > > > Can anyone take a look?
> > > > 
> > > > [ 2678.991496] BUG: non-zero pgtables_bytes on freeing mm: 16384
> > > > [ 2679.001543] BUG: non-zero pgtables_bytes on freeing mm: 16384
> > > > [ 2679.002453] BUG: non-zero pgtables_bytes on freeing mm: 16384
> > > > [ 2679.003256] BUG: non-zero pgtables_bytes on freeing mm: 16384
> > > > [ 2679.013689] BUG: non-zero pgtables_bytes on freeing mm: 16384
> > > > [ 2679.024647] BUG: non-zero pgtables_bytes on freeing mm: 16384
> > > > [ 2679.064408] BUG: non-zero pgtables_bytes on freeing mm: 16384
> > > > [ 2679.133963] BUG: non-zero pgtables_bytes on freeing mm: 16384
> > > > 
> > > > [1]:
> > > > https://github.com/linux-test-project/ltp/blob/master/testcases/cve/cve-2017-17052.c
> > > > 
> > >  
> > > Confirmed, I see this bug with cvs-2017-17052 on my LPAR as well.
> > > I'll look into it.  
> >  
> > Ok, I think I understand the problem now. This is the patch I am testing
> > right now. It seems to fix the issue, but I had to change common mm
> > code for it.
> > --  
> > >From 9e3bc2e96930206ef1ece377e45224c51aca1799 Mon Sep 17 00:00:00 2001  
> > From: Martin Schwidefsky 
> > Date: Fri, 12 Oct 2018 16:32:29 +0200
> > Subject: [RFC][PATCH] s390/mm: fix mis-accounting of pgtable_bytes
> > 
> > In case a fork or a clone system fails in copy_process and the error
> > handling does the mmput() at the bad_fork_cleanup_mm label, the following
> > warning messages will appear on the console:
> > 
> > BUG: non-zero pgtables_bytes on freeing mm: 16384
> > 
> > The reason for that is the tricks we play with mm_inc_nr_puds() and
> > mm_inc_nr_pmds() in init_new_context().
> > 
> > A normal 64-bit process has 3 levels of page table, the p4d level and
> > the pud level are folded. On process termination the free_pud_range()
> > function in mm/memory.c will subtract 16KB from pgtable_bytes with a
> > mm_dec_nr_puds() call, but there actually is not really a pud table.
> > The s390 version of pud_free_tlb() recognized this an does nothing,
> > the region-3 table will be freed with the pgd_free() call later on.
> > But the mm_dec_nr_puds() is done unconditionally, to counter act this
> > the init_new_context() function has an extra mm_inc_nr_puds() call.
> > 
> > Now with a failed fork or clone the free_pgtables() function is not
> > called, there is no mm_dec_nr_puds() but the mm_inc_nr_puds() has
> > been done which leads to the incorrect pgtable_bytes of 16384.
> > Nothing is broken by this, but the warning is annoying.
> > 
> > To get rid of the warning drop the mm_inc_nr_pmds() & mm_inc_nr_puds()
> > calls from init_new_context(), introduce the mm_pmd_folded(),
> > pmd_pud_folded() and pmd_p4d_folded() helper, and add if-statements
> > to the functions mm_[inc|dec]_nr_[pmds|puds].
> > 
> > Signed-off-by: Martin Schwidefsky 
> > ---
> >  arch/s390/include/asm/mmu_context.h |  5 -
> >  arch/s390/include/asm/pgalloc.h |  6 ++---
> >  arch/s390/include/asm/pgtable.h | 18 +++
> >  arch/s390/include/asm/tlb.h |  6 ++---
> >  include/linux/mm.h  | 44 
> > -
> >  5 files changed, 62 insertions(+), 17 deletions(-)
> > 
> > diff --git a/arch/s390/include/asm/mmu_context.h 
> > b/arch/s390/include/asm/mmu_context.h
> > index dbd689d556ce..ccbb53e22024 100644
> > --- a/arch/s390/include/asm/mmu_context.h
> > +++ b/arch/s390/include/asm/mmu_context.h
> > @@ -46,8 +46,6 @@ static inline int init_new_context(struct task_struct 
> > *tsk,
> > mm->context.asce_limit = STACK_TOP_MAX;
> > mm->context.asce = __pa(mm->pgd) | _ASCE_TABLE_LENGTH |
> >_ASCE_USER_BITS | _ASCE_TYPE_REGION3;
> > -   /* pgd_alloc() did not account this pud */
> > -   mm_inc_nr_puds(mm);
> > break;
> > case -PAGE_SIZE:
> > /* forked 5-level task, set new asce with new_mm->pgd */
> > @@ -63,9 +61,6 @@ static inline int init_new_context(struct task_struct 
> > *tsk,
> > /* forked 2-level compat task, set new asce with new mm->pgd */
> > mm->context.asce = __pa(mm->pgd) | _ASCE_TABLE_LENGTH |
> >_ASCE_USER_BITS | _ASCE_TYPE_SEGMENT;
> > -   /* pgd_alloc() did not account this pmd */
> > -   mm_inc_nr_pmds(mm);
> > -   mm_inc_nr_puds(mm);
> > }
> > crst_table_init((unsigned long *) mm->pgd, pgd_entry_type(mm));
> > return 0;
> > dif

[PATCH] apparmor: fix boolreturn.cocci warnings

2018-11-01 Thread kbuild test robot

From: kbuild test robot 

security/apparmor/policy_unpack.c:242:9-10: WARNING: return of 0/1 in function 
'unpack_X' with return type bool
security/apparmor/policy_unpack.c:288:9-10: WARNING: return of 0/1 in function 
'unpack_nameX' with return type bool
security/apparmor/policy_unpack.c:615:8-9: WARNING: return of 0/1 in function 
'unpack_rlimits' with return type bool
security/apparmor/policy_unpack.c:574:8-9: WARNING: return of 0/1 in function 
'unpack_secmark' with return type bool
security/apparmor/policy_unpack.c:508:8-9: WARNING: return of 0/1 in function 
'unpack_trans_table' with return type bool
security/apparmor/policy_unpack.c:312:10-11: WARNING: return of 0/1 in function 
'unpack_u32' with return type bool
security/apparmor/policy_unpack.c:325:10-11: WARNING: return of 0/1 in function 
'unpack_u64' with return type bool
security/apparmor/policy_unpack.c:299:10-11: WARNING: return of 0/1 in function 
'unpack_u8' with return type bool
security/apparmor/policy_unpack.c:538:8-9: WARNING: return of 0/1 in function 
'unpack_xattrs' with return type bool
security/apparmor/policy_unpack.c:969:10-11: WARNING: return of 0/1 in function 
'verify_dfa_xindex' with return type bool
security/apparmor/policy_unpack.c:959:9-10: WARNING: return of 0/1 in function 
'verify_xindex' with return type bool

 Return statements in functions returning bool should use
 true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci

Fixes: 9caafbe2b4cf ("apparmor: Parse secmark policy")
CC: Matthew Garrett 
Signed-off-by: kbuild test robot 
---

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor.git 
apparmor-next
head:   566f52ece7bd1099d20dfe2f6f0801896643cf8f
commit: 9caafbe2b4cf4c635826a2832e93cf648605de8b [9/16] apparmor: Parse secmark 
policy

 policy_unpack.c |   54 +++---
 1 file changed, 27 insertions(+), 27 deletions(-)

--- a/security/apparmor/policy_unpack.c
+++ b/security/apparmor/policy_unpack.c
@@ -239,11 +239,11 @@ static size_t unpack_u16_chunk(struct aa
 static bool unpack_X(struct aa_ext *e, enum aa_code code)
 {
if (!inbounds(e, 1))
-   return 0;
+   return false;
if (*(u8 *) e->pos != code)
-   return 0;
+   return false;
e->pos++;
-   return 1;
+   return true;
 }
 
 /**
@@ -285,50 +285,50 @@ static bool unpack_nameX(struct aa_ext *
 
/* now check if type code matches */
if (unpack_X(e, code))
-   return 1;
+   return true;
 
 fail:
e->pos = pos;
-   return 0;
+   return false;
 }
 
 static bool unpack_u8(struct aa_ext *e, u8 *data, const char *name)
 {
if (unpack_nameX(e, AA_U8, name)) {
if (!inbounds(e, sizeof(u8)))
-   return 0;
+   return false;
if (data)
*data = get_unaligned((u8 *)e->pos);
e->pos += sizeof(u8);
-   return 1;
+   return true;
}
-   return 0;
+   return false;
 }
 
 static bool unpack_u32(struct aa_ext *e, u32 *data, const char *name)
 {
if (unpack_nameX(e, AA_U32, name)) {
if (!inbounds(e, sizeof(u32)))
-   return 0;
+   return false;
if (data)
*data = le32_to_cpu(get_unaligned((__le32 *) e->pos));
e->pos += sizeof(u32);
-   return 1;
+   return true;
}
-   return 0;
+   return false;
 }
 
 static bool unpack_u64(struct aa_ext *e, u64 *data, const char *name)
 {
if (unpack_nameX(e, AA_U64, name)) {
if (!inbounds(e, sizeof(u64)))
-   return 0;
+   return false;
if (data)
*data = le64_to_cpu(get_unaligned((__le64 *) e->pos));
e->pos += sizeof(u64);
-   return 1;
+   return true;
}
-   return 0;
+   return false;
 }
 
 static size_t unpack_array(struct aa_ext *e, const char *name)
@@ -505,12 +505,12 @@ static bool unpack_trans_table(struct aa
if (!unpack_nameX(e, AA_STRUCTEND, NULL))
goto fail;
}
-   return 1;
+   return true;
 
 fail:
aa_free_domain_entries(&profile->file.trans);
e->pos = saved_pos;
-   return 0;
+   return false;
 }
 
 static bool unpack_xattrs(struct aa_ext *e, struct aa_profile *profile)
@@ -535,11 +535,11 @@ static bool unpack_xattrs(struct aa_ext
goto fail;
}
 
-   return 1;
+   return true;
 
 fail:
e->pos = pos;
-   return 0;
+   return false;
 }
 
 static bool unpack_secmark(struct aa_ext *e, struct aa_profile *profile)
@@ -571,7 +571,7 @@ static bool unpack_secmark(struct aa_ext
goto fail;
}
 
-

Re: [PATCH v2 00/11] arch/x86: AMD QoS support

2018-11-01 Thread Jon Masters

On 10/5/18 4:55 PM, Moger, Babu wrote:

> The public specification for this feature is available at
> https://www.amd.com/system/files/TechDocs/56375_Quality_of_Service_Extensions.pdf

404 error

Re: [PATCH] tsc: make calibration refinement more robust

2018-11-01 Thread Daniel Vacek

Hi Thomas,

thanks for checking.

On Thu, Nov 1, 2018 at 4:34 PM, Thomas Gleixner  wrote:
> Daniel,
>
> On Thu, 1 Nov 2018, Daniel Vacek wrote:
>
> Please use 'x86/tsc:' as prefix. git log path/to/file usually gives you a
> reasonable hint about prefixes.

Oh, sure thing. The dmesg always prints 'tsc:' - I somehow sticked to it...

>> -#define MAX_RETRIES 5
>> -#define SMI_TRESHOLD5
>> +#define MAX_RETRIES  5
>> +#define TSC_THRESHOLD(tsc_khz >> 5)
>
> This breaks pit_hpet_ptimer_calibrate_cpu() because at that point tsc_hkz is 
> 0.

That did not show up with my testing, sorry. I guess
pit_calibrate_tsc() never failed for me. Hmm, actually it looks like
quick_pit_calibrate() does the job for me so
pit_hpet_ptimer_calibrate_cpu() is likely not even called. Would this:

#define TSC_THRESHOLD   (tsc_khz? tsc_khz >> 5: 0x2)

work for you instead? Or alternatively at some point when chasing this
down I used:

#define TSC_THRESHOLD (0x1 + (tsc_khz >> 6))

The first one seems better though. I can send v2 next week if you like it.

--nX

> Thanks,
>
> tglx

perf script doesn't dump a normal call trace

2018-11-01 Thread Xin Long

On upstream kernel(4.19) or RHEL-8 kernel(4.18.0):

# perf record -e 'skb:consume_skb' -ag
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.612 MB perf.data (634 samples) ]

# perf script
swapper 0 [009] 274370.117711: skb:consume_skb: skbaddr=0x962c591d5b00
a4abe534 consume_skb+0x64 ([kernel.kallsyms])

kworker/9:1-eve   926 [009] 274370.117729: skb:consume_skb:
skbaddr=0x962c591d5b00
a4abe534 consume_skb+0x64 ([kernel.kallsyms])

kworker/9:1-eve   926 [009] 274370.117732: skb:consume_skb:
skbaddr=0x962c591d4900
a4abe534 consume_skb+0x64 ([kernel.kallsyms])

swapper 0 [009] 274370.145528: skb:consume_skb: skbaddr=0x962c591d4900
a4abe534 consume_skb+0x64 ([kernel.kallsyms])

kworker/9:1-eve   926 [009] 274370.145545: skb:consume_skb:
skbaddr=0x962c591d4900
a4abe534 consume_skb+0x64 ([kernel.kallsyms])

kworker/9:1-eve   926 [009] 274370.145547: skb:consume_skb:
skbaddr=0x962c591d5b00
a4abe534 consume_skb+0x64 ([kernel.kallsyms])

swapper 0 [009] 274370.173443: skb:consume_skb: skbaddr=0x962c591d5b00
a4abe534 consume_skb+0x64 ([kernel.kallsyms])


On RHEL-7 kernel(3.10.0):

# perf record -e 'skb:consume_skb' -ag
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.170 MB perf.data (214 samples) ]

# perf script
swapper 0 [001] 69006.726193: skb:consume_skb: skbaddr=0x917db9647900
7fffb3425a00 consume_skb ([kernel.kallsyms])
7fffb34ba3cb arp_process ([kernel.kallsyms])
7fffb34bad65 arp_rcv ([kernel.kallsyms])
7fffb343b6d9 __netif_receive_skb_core ([kernel.kallsyms])
7fffb343b9d8 __netif_receive_skb ([kernel.kallsyms])
7fffb343ba60 netif_receive_skb_internal ([kernel.kallsyms])
7fffb343c6e8 napi_gro_receive ([kernel.kallsyms])
7fffc007d1f5 virtnet_poll ([kernel.kallsyms])
7fffb343c07f net_rx_action ([kernel.kallsyms])
7fffb2ea2f05 __do_softirq ([kernel.kallsyms])
7fffb357a32c call_softirq ([kernel.kallsyms])
7fffb2e30675 do_softirq ([kernel.kallsyms])
7fffb2ea3285 irq_exit ([kernel.kallsyms])
7fffb357b5e6 __irqentry_text_start ([kernel.kallsyms])
7fffb356d362 ret_from_intr ([kernel.kallsyms])
7fffb356c12e default_idle ([kernel.kallsyms])
7fffb2e386f0 arch_cpu_idle ([kernel.kallsyms])
7fffb2efe3ba cpu_startup_entry ([kernel.kallsyms])
7fffb2e59db7 start_secondary ([kernel.kallsyms])
7fffb2e020d5 start_cpu ([kernel.kallsyms])

swapper 0 [001] 69006.754090: skb:consume_skb: skbaddr=0x917db9647100
7fffb3425a00 consume_skb ([kernel.kallsyms])
7fffb34ba3cb arp_process ([kernel.kallsyms])
7fffb34bad65 arp_rcv ([kernel.kallsyms])
7fffb343b6d9 __netif_receive_skb_core ([kernel.kallsyms])
7fffb343b9d8 __netif_receive_skb ([kernel.kallsyms])
7fffb343ba60 netif_receive_skb_internal ([kernel.kallsyms])
7fffb343c6e8 napi_gro_receive ([kernel.kallsyms])
7fffc007d1f5 virtnet_poll ([kernel.kallsyms])
7fffb343c07f net_rx_action ([kernel.kallsyms])
7fffb2ea2f05 __do_softirq ([kernel.kallsyms])
7fffb357a32c call_softirq ([kernel.kallsyms])
7fffb2e30675 do_softirq ([kernel.kallsyms])
7fffb2ea3285 irq_exit ([kernel.kallsyms])
7fffb357b5e6 __irqentry_text_start ([kernel.kallsyms])
7fffb356d362 ret_from_intr ([kernel.kallsyms])
7fffb356c12e default_idle ([kernel.kallsyms])
7fffb2e386f0 arch_cpu_idle ([kernel.kallsyms])
7fffb2efe3ba cpu_startup_entry ([kernel.kallsyms])


any idea why I could get a proper call trace on the new kernel?

Thanks.

[GIT PULL] apparmor updates for v4.20

2018-11-01 Thread John Johansen

Hi,


Please pull these apparmor changes for v4.20. 
Thanks!

- John

The following changes since commit fb7d1bcf1602b46f37ada72178516c01a250e434:

  Merge tag 'pci-v4.18-fixes-3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci (2018-07-19 11:54:04 
-0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor 
tags/apparmor-pr-2018-11-01

for you to fetch changes up to 566f52ece7bd1099d20dfe2f6f0801896643cf8f:

  apparmor: clean an indentation issue, remove extraneous space (2018-11-01 
22:34:25 -0700)


+ Features/Improvements
  - replace spin_is_locked() with lockdep
  - add base support for secmark labeling and matching

+ Cleanups
  - clean an indentation issue, remove extraneous space
  - remove no-op permission check in policy_unpack
  - fix checkpatch missing spaces error in Parse secmark policy
  - fix network performance issue in aa_label_sk_perm

+ Bug fixes
  - add #ifdef checks for secmark filtering
  - fix an error code in __aa_create_ns()
  - don't try to replace stale label in ptrace checks
  - fix failure to audit context info in build_change_hat
  - check buffer bounds when mapping permissions mask
  - fully initialize aa_perms struct when answering userspace query
  - fix uninitialized value in aa_split_fqname


Arnd Bergmann (1):
  apparmor: add #ifdef checks for secmark filtering

Colin Ian King (1):
  apparmor: clean an indentation issue, remove extraneous space

Dan Carpenter (1):
  apparmor: fix an error code in __aa_create_ns()

Jann Horn (2):
  apparmor: don't try to replace stale label in ptrace access check
  apparmor: don't try to replace stale label in ptraceme check

John Johansen (3):
  apparmor: Fix failure to audit context info in build_change_hat
  apparmor: remove no-op permission check in policy_unpack
  apparmor: fix checkpatch error in Parse secmark policy

Lance Roy (1):
  apparmor: Replace spin_is_locked() with lockdep

Matthew Garrett (3):
  apparmor: Add a wildcard secid
  apparmor: Parse secmark policy
  apparmor: Allow filtering based on secmark policy

Tony Jones (1):
  apparmor: Fix network performance issue in aa_label_sk_perm

Tyler Hicks (2):
  apparmor: Check buffer bounds when mapping permissions mask
  apparmor: Fully initialize aa_perms struct when answering userspace query

Zubin Mithra (1):
  apparmor: Fix uninitialized value in aa_split_fqname

 security/apparmor/apparmorfs.c |   7 +-
 security/apparmor/domain.c |   2 +-
 security/apparmor/file.c   |   5 +-
 security/apparmor/include/cred.h   |   2 +
 security/apparmor/include/net.h|  10 +++
 security/apparmor/include/perms.h  |   3 +-
 security/apparmor/include/policy.h |   3 +
 security/apparmor/include/secid.h  |   3 +
 security/apparmor/lib.c|  23 +--
 security/apparmor/lsm.c| 130 +++--
 security/apparmor/net.c|  83 +--
 security/apparmor/policy.c |   3 +
 security/apparmor/policy_ns.c  |   2 +-
 security/apparmor/policy_unpack.c  |  93 +-
 security/apparmor/secid.c  |   3 +-
 15 files changed, 311 insertions(+), 61 deletions(-)

Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-11-01 Thread Anshuman Khandual

On 10/17/2018 07:39 AM, Andrea Arcangeli wrote:
> Hello Zi,
> 
> On Sun, Oct 14, 2018 at 08:53:55PM -0400, Zi Yan wrote:
>> Hi Andrea, what is the purpose/benefit of making x86’s pmd_present() returns 
>> true
>> for a THP under splitting? Does it cause problems when ARM64’s pmd_present()
>> returns false in the same situation?
Thank you Andrea for a such a detailed explanation. It really helped us in
understanding certain subtle details about pmd_present() & pmd_trans_huge().

> 
> !pmd_present means it's a migration entry or swap entry and doesn't
> point to RAM. It means if you do pmd_to_page(*pmd) it will return you
> an undefined result.

Sure but this needs to be made clear some where. Not sure whether its better
just by adding some in-code documentation or enforcing it in generic paths.

> 
> During splitting the physical page is still very well pointed by the
> pmd as long as pmd_trans_huge returns true and you hold the
> pmd_lock.

Agreed, it still does point to a huge page in RAM. So pmd_present() should
just return true in such cases as you have explained above.

> 
> pmd_trans_huge must be true at all times for a transhuge pmd that
> points to a hugepage, or all VM fast paths won't serialize with the

But as Naoya mentioned we should not check for pmd_trans_huge() on swap or
migration entries. If this makes sense, I will be happy to look into this
further and remove/replace pmd_trans_huge() check from affected code paths.

> pmd_lock, that is the only reason why, and it's a very good reason
> because it avoids to take the pmd_lock when walking over non transhuge
> pmds (i.e. when there are no THP allocated).
> 
> Now if we've to keep _PAGE_PSE set and return true in pmd_trans_huge
> at all times, why would you want to make pmd_present return false? How
> could it help if pmd_trans_huge returns true, but pmd_present returns
> false despite pmd_to_page works fine and the pmd is really still
> pointing to the page?

Then what is the difference between pmd_trans_huge() and pmd_present()
if both should return true if the PMD points to a huge page in RAM and
pmd_page() also returns a valid huge page in RAM.

> 
> When userland faults on such pmd !pmd_present it will make the page
> fault take a swap or migration path, but that's the wrong path if the
> pmd points to RAM.
This is a real concern. __handle_mm_fault() does check for a swap entry
(which can only be a migration entry at the moment) and then wait on
till the migration is completed.

if (unlikely(is_swap_pmd(orig_pmd))) {
VM_BUG_ON(thp_migration_supported() &&
  !is_pmd_migration_entry(orig_pmd));
if (is_pmd_migration_entry(orig_pmd))
pmd_migration_entry_wait(mm, vmf.pmd);
return 0;
}

> 
> What we need to do during split is an invalidate of the huge TL> There's no 
> pmd_trans_splitting anymore, so we only clear the present
> bit in the PTE despite pmd_present still returns true (just like
> PROT_NONE, nothing new in this respect). pmd_present never meant the

On arm64, the problem is that pmd_present() is tied with pte_present() which
checks for PTE_VALID (also PTE_PROT_NONE) but which gets cleared during PTE
invalidation. pmd_present() returns false just after the first step of PMD
splitting. So pmd_present() needs to be decoupled from PTE_VALID which is
same as PMD_SECT_VALID and instead should depend upon a pte bit which sticks
around like PAGE_PSE as in case of x86. I am working towards a solution.

> real present bit in the pte was set, it just means the pmd points to
> RAM. It means it doesn't point to swap or migration entry and you can
> do pmd_to_page and it works fine
> We need to invalidate the TLB by clearing the present bit and by
> flushing the TLB before overwriting the transhuge pmd with the regular
> pte (i.e. to make it non huge). That is actually required by an errata
> (l1 cache aliasing of the same mapping through two different TLB of
> two different sizes broke some old CPU and triggered machine checks).
> It's not something fundamentally necessary from a common code point of

TLB entries mapping same VA -> PA space with different pages sizes might
not co-exist with each other which requires TLB invalidation. PMD split
phase initiating a TLB invalidation is not like getting around a CPU HW
problem but its just that SW should not assume behavior on behalf of the
architecture regarding which TLB entries can co-exist at any point.

> view. It's more risky from an hardware (not software) standpoint and
> before you can get rid of the pmd you need to do a TLB flush anyway to
> be sure CPUs stops using it, so better clear the present bit before
> doing the real costly thing (the tlb flush with IPIs). Clearing the
> present bit during the TLB flush is a cost that gets lost in the noise.

Doing TLB invalidation is not tied to whether present bit is m

Re: [RFC] doc: rcu: remove note on smp_mb during synchronize_rcu

2018-11-01 Thread Joel Fernandes

On Thu, Nov 01, 2018 at 09:13:07AM -0700, Paul E. McKenney wrote:
> > > > BTW I do want to discuss about this smp_mb patch above with you at LPC 
> > > > if you
> > > > had time, even though we are removing it from the documentation. I 
> > > > thought
> > > > about it a few times, and I was not able to fully appreciate the need 
> > > > for the
> > > > barrier (that is even assuming that complete() etc did not do the right
> > > > thing).  Specifically I was wondering same thing Peter said in the above
> > > > thread I think that - if that rcu_read_unlock() triggered all the spin
> > > > locking up the tree of nodes, then why is that locking not sufficient to
> > > > prevent reads from the read-side section from bleeding out? That would
> > > > prevent the reader that just unlocked from seeing anything that happens
> > > > _after_ the synchronize_rcu.
> > > 
> > > Actually, I recall an smp_mb() being added, but am not seeing it anywhere
> > > relevant to wait_for_completion().  So I might need to add the smp_mb()
> > > to synchronize_rcu() and remove the patch (retaining the typo fix).  :-/
> > 
> > No problem, I'm glad atleast the patch resurfaced the topic of the potential
> > issue :-)
> 
> And an smp_mb() is needed in Tree RCU's __wait_rcu_gp().  This is
> because wait_for_completion() might get a "fly-by" wakeup, which would
> mean no ordering for code naively thinking that it was ordered after a
> grace period.

Makes sense.

> > > The short form answer is that anything before a grace period on any CPU
> > > must be seen by any CPU as being before anything on any CPU after that
> > > same grace period.  This guarantee requires a rather big hammer.
> > > 
> > > But yes, let's talk at LPC!
> > 
> > Sounds great, looking forward to discussing this.
> 
> Would it make sense to have an RCU-implementation BoF?

Yes, I would very much like that. I also spoke with my colleage Daniel
Colascione and he said he would be interested too.

I think it would make sense also to combine it with other memory-ordering
topics like the memory model and rseq/cpu-opv things that Mathieu was doing
(if it makes sense to combine). But yes, I am definitely interested in an
RCU-implementation BoF session.

> > > > Also about GP memory ordering and RCU-tree-locking, I think you 
> > > > mentioned to
> > > > me that the RCU reader-sections are virtually extended both forward and
> > > > backward and whereever it ends, those paths do heavy-weight 
> > > > synchronization
> > > > that should be sufficient to prevent memory ordering issues (such as 
> > > > those
> > > > you mentioned in the Requierments document). That is exactly why we 
> > > > don't
> > > > need explicit barriers during rcu_read_unlock. If I recall I asked you 
> > > > why
> > > > those are not needed. So that answer made sense, but then now on going
> > > > through the 'Memory Ordering' document, I see that you mentioned there 
> > > > is
> > > > reliance on the locking. Is that reliance on locking necessary to 
> > > > maintain
> > > > ordering then?
> > > 
> > > There is a "network" of locking augmented by smp_mb__after_unlock_lock()
> > > that implements the all-to-all memory ordering mentioned above.  But it
> > > also needs to handle all the possible complete()/wait_for_completion()
> > > races, even those assisted by hypervisor vCPU preemption.
> > 
> > I see, so it sounds like the lock network is just a partial solution. For
> > some reason I thought before that complete() was even called on the CPU
> > executing the callback, all the CPUs would have acquired and released a lock
> > in the "lock network" atleast once thus ensuring the ordering (due to the
> > fact that the quiescent state reporting has to travel up the tree starting
> > from the leaves), but I think that's not necessarily true so I see your 
> > point
> > now.
> 
> There is indeed a lock that is unconditionally acquired and released by
> wait_for_completion(), but it lacks the smp_mb__after_unlock_lock() that
> is required to get full-up any-to-any ordering.  And unfortunate timing
> (as well as spurious wakeups) allow the interaction to have only normal
> lock-release/acquire ordering, which does not suffice in all cases.
> 
> SRCU and expedited RCU grace periods handle this correctly.  Only the
> normal grace periods are missing the needed barrier.  The probability of
> failure is extremely low in the common case, which involves all sorts
> of synchronization on the wakeup path.  It would be quite strange (but
> not impossible) for the wait_for_completion() exit path to -not- to do
> a full wakeup.  Plus the bug requires a reader before the grace period
> to do a store to some location that post-grace-period code loads from.
> Which is a very rare use case.
> 
> But it still should be fixed.  ;-)
> 
> > Did you feel this will violate condition 1. or condition 2. in 
> > "Memory-Barrier
> > Guarantees"? Or both?
> > https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Require

Re: [PATCH v1 6/7] vfio: ap: register guest ISC with GISA and GIB

2018-11-01 Thread kbuild test robot

Hi Pierre,

I love your patch! Yet something to improve:

[auto build test ERROR on s390/features]
[also build test ERROR on next-20181101]
[cannot apply to v4.19]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Pierre-Morel/s390-vfio-ap-Using-GISA-for-AP-Interrupt/20181102-010854
base:   https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
config: s390-allmodconfig (attached as .config)
compiler: s390x-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=s390 

All errors (new ones prefixed by >>):

>> ERROR: "kvm_s390_gisc_unregister" [drivers/s390/crypto/vfio_ap.ko] undefined!
>> ERROR: "kvm_s390_gisc_register" [drivers/s390/crypto/vfio_ap.ko] undefined!
   ERROR: "__node_distance" [drivers/nvme/host/nvme-core.ko] undefined!

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-11-01 Thread Anshuman Khandual

 

On 10/18/2018 07:47 AM, Naoya Horiguchi wrote:
> On Tue, Oct 16, 2018 at 10:31:50AM -0400, Zi Yan wrote:
>> On 15 Oct 2018, at 0:06, Anshuman Khandual wrote:
>>
>>> On 10/15/2018 06:23 AM, Zi Yan wrote:
 On 12 Oct 2018, at 4:00, Anshuman Khandual wrote:

> On 10/10/2018 06:13 PM, Zi Yan wrote:
>> On 10 Oct 2018, at 0:05, Anshuman Khandual wrote:
>>
>>> On 10/09/2018 07:28 PM, Zi Yan wrote:
 cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE 
 for x86
 PMD migration entry check)

 On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:

> A normal mapped THP page at PMD level should be correctly 
> differentiated
> from a PMD migration entry while walking the page table. A mapped THP 
> would
> additionally check positive for pmd_present() along with 
> pmd_trans_huge()
> as compared to a PMD migration entry. This just adds a new 
> conditional test
> differentiating the two while walking the page table.
>
> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
> Signed-off-by: Anshuman Khandual 
> ---
> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always 
> mutually
> exclusive which makes the current conditional block work for both 
> mapped
> and migration entries. This is not same with arm64 where 
> pmd_trans_huge()

 !pmd_present() && pmd_trans_huge() is used to represent THPs under 
 splitting,
>>>
>>> Not really if we just look at code in the conditional blocks.
>>
>> Yeah, I explained it wrong above. Sorry about that.
>>
>> In x86, pmd_present() checks (_PAGE_PRESENT | _PAGE_PROTNONE | 
>> _PAGE_PSE),
>> thus, it returns true even if the present bit is cleared but PSE bit is 
>> set.
>
> Okay.
>
>> This is done so, because THPs under splitting are regarded as present in 
>> the kernel
>> but not present when a hardware page table walker checks it.
>
> Okay.
>
>>
>> For PMD migration entry, which should be regarded as not present, if PSE 
>> bit
>> is set, which makes pmd_trans_huge() returns true, like ARM64 does, all
>> PMD migration entries will be regarded as present
>
> Okay to make pmd_present() return false pmd_trans_huge() has to return 
> false
> as well. Is there anything which can be done to get around this problem on
> X86 ? pmd_trans_huge() returning true for a migration entry sounds 
> logical.
> Otherwise we would revert the condition block order to accommodate both 
> the
> implementation for pmd_trans_huge() as suggested by Kirill before or just
> consider this patch forward.
>
> Because I am not really sure yet about the idea of getting pmd_present()
> check into pmd_trans_huge() on arm64 just to make it fit into this 
> semantics
> as suggested by Will. If a PMD is trans huge page or not should not 
> depend on
> whether it is present or not.

 In terms of THPs, we have three cases: a present THP, a THP under 
 splitting,
 and a THP under migration. pmd_present() and pmd_trans_huge() both return 
 true
 for a present THP and a THP under splitting, because they discover 
 _PAGE_PSE bit
>>>
>>> Then how do we differentiate between a mapped THP and a splitting THP.
>>
>> AFAIK, in x86, there is no distinction between a mapped THP and a splitting 
>> THP
>> using helper functions.
>>
>> A mapped THP has _PAGE_PRESENT bit and _PAGE_PSE bit set, whereas a 
>> splitting THP
>> has only _PAGE_PSE bit set. But both pmd_present() and pmd_trans_huge() 
>> return
>> true as long as _PAGE_PSE bit is set.
>>
>>>
 is set for both cases, whereas they both return false for a THP under 
 migration.
 You want to change them to make pmd_trans_huge() returns true for a THP 
 under migration
 instead of false to help ARM64’s support for THP migration.
>>> I am just trying to understand the rationale behind this semantics and see 
>>> where
>>> it should be fixed.
>>>
>>> I think the fundamental problem here is that THP under split has been 
>>> difficult
>>> to be re-presented through the available helper functions and in turn PTE 
>>> bits.
>>>
>>> The following checks
>>>
>>> 1) pmd_present()
>>> 2) pmd_trans_huge()
>>>
>>> Represent three THP states
>>>
>>> 1) Mapped THP   (pmd_present && pmd_trans_huge)
>>> 2) Splitting THP(pmd_present && pmd_trans_huge)
>>> 3) Migrating THP(!pmd_present && !pmd_trans_huge)
>>>
>>> The problem is if we make pmd_trans_huge() return true for all the three 
>>> states
>>> which sounds logical because they are all still trans huge PMD, then 
>>> pmd_present()
>>> can only represent two states not three as required.
>>
>> We are on the same page about repres

[GIT PULL] vfs: fix many problems in vfs clone/dedupe implementation

2018-11-01 Thread Dave Chinner

Hi Linus,

Can you please pull update containing a rework of the VFS clone and
dedupe file range infrastructure from the tag listed below?

We discovered many issues with these interfaces late in the 4.19
cycle - the worst of them (data corruption, setuid stripping) were
fixed for XFS in 4.19-rc8, but a larger rework of the infrastructure
fixing all the problems was needed. That rework is the contents of
this pull request.

The base tree is 4.19 because there was an unrelated
vfs_clone_file_range API cleanup merged in v4.19-rc7, and combined
with the mods in 4.19-rc8 it was simpler for everyone to base this
work on a tree with all those changes already in it.

There is a simple conflict with your current tree in
Documentation/filesystems/porting. However, if you pull Al's pending
VFS tree before this there will also be a more significant conflict
fs/read_write.c in the vfs_dedupe_file_range_one() function rework.
The details of the conflict  and the resolution that the linux-next
tree is carrying can be found here:

https://lore.kernel.org/lkml/20181031115247.6adcb...@canb.auug.org.au/

If you need any more info or a tree with the conflicts already
resolved, please let me know.

Thanks,

Dave.

PS. Darrick is back up to speed so the next XFS pull request for
fixes later in the -rc cycle will probably come from him again.

The following changes since commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d:

  Linux 4.19 (2018-10-22 07:37:37 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/fs/xfs/xfs-linux tags/xfs-4.20-merge-2

for you to fetch changes up to bf4a1fcf0bc18d52cf0fce6571d6f327ab5eaf22:

  xfs: remove [cm]time update from reflink calls (2018-10-30 10:47:48 +1100)


vfs: rework data cloning infrastructure

Rework the vfs_clone_file_range and vfs_dedupe_file_range infrastructure to use
a common .remap_file_range method and supply generic bounds and sanity checking
functions that are shared with the data write path. The current VFS
infrastructure has problems with rlimit, LFS file sizes, file time stamps,
maximum filesystem file sizes, stripping setuid bits, etc and so they are
addressed in these commits.

We also introduce the ability for the ->remap_file_range methods to return short
clones so that clones for vfs_copy_file_range() don't get rejected if the entire
range can't be cloned. It also allows filesystems to sliently skip deduplication
of partial EOF blocks if they are not capable of doing so without requiring
errors to be thrown to userspace.

All existing filesystems are converted to user the new .remap_file_range method,
and both XFS and ocfs2 are modified to make use of the new generic checking
infrastructure.


Darrick J. Wong (28):
  vfs: vfs_clone_file_prep_inodes should return EINVAL for a clone from 
beyond EOF
  vfs: check file ranges before cloning files
  vfs: exit early from zero length remap operations
  vfs: strengthen checking of file range inputs to generic_remap_checks
  vfs: avoid problematic remapping requests into partial EOF block
  vfs: skip zero-length dedupe requests
  vfs: rename vfs_clone_file_prep to be more descriptive
  vfs: rename clone_verify_area to remap_verify_area
  vfs: combine the clone and dedupe into a single remap_file_range
  vfs: pass remap flags to generic_remap_file_range_prep
  vfs: pass remap flags to generic_remap_checks
  vfs: remap helper should update destination inode metadata
  vfs: make remap_file_range functions take and return bytes completed
  vfs: plumb remap flags through the vfs clone functions
  vfs: plumb remap flags through the vfs dedupe functions
  vfs: enable remap callers that can handle short operations
  vfs: hide file range comparison function
  vfs: clean up generic_remap_file_range_prep return value
  ocfs2: truncate page cache for clone destination file before remapping
  ocfs2: fix pagecache truncation prior to reflink
  ocfs2: support partial clone range and dedupe range
  ocfs2: remove ocfs2_reflink_remap_range
  xfs: fix pagecache truncation prior to reflink
  xfs: clean up xfs_reflink_remap_blocks call site
  xfs: support returning partial reflink results
  xfs: remove redundant remap partial EOF block checks
  xfs: remove xfs_reflink_remap_range
  xfs: remove [cm]time update from reflink calls

 Documentation/filesystems/porting |   5 +
 Documentation/filesystems/vfs.txt |  22 ++-
 fs/btrfs/ctree.h  |   8 +-
 fs/btrfs/file.c   |   3 +-
 fs/btrfs/ioctl.c  |  50 ++---
 fs/cifs/cifsfs.c  |  24 ++-
 fs/ioctl.c|  10 +-
 fs/nfs/nfs4file.c |  12 +-
 fs/nfsd/vfs.c |   8 +-
 fs/ocfs2/file.c   |  93 +++--
 fs/ocfs2/refcou

Re: [PATCH V2 3/5] Drivers: hv: kvp: Fix the recent regression caused by incorrect clean-up

2018-11-01 Thread gre...@linuxfoundation.org

On Thu, Nov 01, 2018 at 07:22:28PM +, Dexuan Cui wrote:
> > From: gre...@linuxfoundation.org 
> > Sent: Thursday, November 1, 2018 11:57
> > To: Dexuan Cui 
> > 
> > On Wed, Oct 31, 2018 at 11:23:54PM +, Dexuan Cui wrote:
> > > > From: Michael Kelley 
> > > > Sent: Wednesday, October 24, 2018 08:38
> > > > From: k...@linuxonhyperv.com   Sent:
> > Wednesday,
> > > > October 17, 2018 10:10 PM
> > > > > From: Dexuan Cui 
> > > > >
> > > > > In kvp_send_key(), we do need call process_ib_ipinfo() if
> > > > > message->kvp_hdr.operation is KVP_OP_GET_IP_INFO, because it turns
> > out
> > > > > the userland hv_kvp_daemon needs the info of operation, adapter_id
> > and
> > > > > addr_family. With the incorrect fc62c3b1977d, the host can't get the
> > > > > VM's IP via KVP.
> > > > >
> > > > > And, fc62c3b1977d added a "break;", but actually forgot to initialize
> > > > > the key_size/value in the case of KVP_OP_SET, so the default key_size 
> > > > > of
> > > > > 0 is passed to the kvp daemon, and the pool files
> > > > > /var/lib/hyperv/.kvp_pool_* can't be updated.
> > > > >
> > > > > This patch effectively rolls back the previous fc62c3b1977d, and
> > > > > correctly fixes the "this statement may fall through" warnings.
> > > > >
> > > > > This patch is tested on WS 2012 R2 and 2016.
> > > > >
> > > > > Fixes: fc62c3b1977d ("Drivers: hv: kvp: Fix two "this statement may 
> > > > > fall
> > > > through" warnings")
> > > > > Signed-off-by: Dexuan Cui 
> > > > > Cc: K. Y. Srinivasan 
> > > > > Cc: Haiyang Zhang 
> > > > > Cc: Stephen Hemminger 
> > > > > Cc: 
> > > > > Signed-off-by: K. Y. Srinivasan 
> > > > > ---
> > > > >  drivers/hv/hv_kvp.c | 26 ++
> > > > >  1 file changed, 22 insertions(+), 4 deletions(-)
> > > > >
> > > > Reviewed-by: Michael Kelley 
> > >
> > > Hi Greg,
> > > Can you please take a look at this patch?
> > 
> > Nope, I'm not the hv maintainer, they need to look at this and ack it,
> > not me :)
> > 
> > greg k-h
> 
> Hi Greg,
> KY has added his Signed-off-by in the mail.
> 
> I'll ask the other HV maintainers to take a look as well.

Ok, then I'll look at it after 4.20-rc1 is out, nothing I can do until
then anyway...

thanks,

greg k-h

[PATCH v2 1/3] x86: add support for Huawei WMI hotkeys.

2018-11-01 Thread Ayman Bagabas

This driver adds support for missing hotkeys on some Huawei laptops.
Currently, only Huawei Matebook X Pro is supported. The driver
recognizes the following keys: brightness keys, micmute, wlan, and
Huawei special key. The brightness keys are ignored since they work out
of the box.

Signed-off-by: Ayman Bagabas 
---
 drivers/platform/x86/Kconfig  |  13 ++
 drivers/platform/x86/Makefile |   1 +
 drivers/platform/x86/huawei_wmi.c | 223 ++
 3 files changed, 237 insertions(+)
 create mode 100644 drivers/platform/x86/huawei_wmi.c

diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig
index 0c1aa6c314f5..c6813981e45c 100644
--- a/drivers/platform/x86/Kconfig
+++ b/drivers/platform/x86/Kconfig
@@ -1229,6 +1229,19 @@ config I2C_MULTI_INSTANTIATE
  To compile this driver as a module, choose M here: the module
  will be called i2c-multi-instantiate.
 
+config HUAWEI_LAPTOP
+   tristate "Huawei WMI hotkeys driver"
+   depends on ACPI
+   depends on ACPI_WMI
+   depends on INPUT
+   select INPUT_SPARSEKMAP
+   help
+ This driver provides support for Huawei WMI hotkeys.
+ It enables the missing keys and adds support to micmute
+ led found on these laptops.q
+ Supported devices are:
+ - Matebook X Pro
+
 endif # X86_PLATFORM_DEVICES
 
 config PMC_ATOM
diff --git a/drivers/platform/x86/Makefile b/drivers/platform/x86/Makefile
index e6d1becf81ce..5984354e18ff 100644
--- a/drivers/platform/x86/Makefile
+++ b/drivers/platform/x86/Makefile
@@ -29,6 +29,7 @@ obj-$(CONFIG_ACERHDF) += acerhdf.o
 obj-$(CONFIG_HP_ACCEL) += hp_accel.o
 obj-$(CONFIG_HP_WIRELESS)  += hp-wireless.o
 obj-$(CONFIG_HP_WMI)   += hp-wmi.o
+obj-$(CONFIG_HUAWEI_LAPTOP)+= huawei_wmi.o
 obj-$(CONFIG_AMILO_RFKILL) += amilo-rfkill.o
 obj-$(CONFIG_GPD_POCKET_FAN)   += gpd-pocket-fan.o
 obj-$(CONFIG_TC1100_WMI)   += tc1100-wmi.o
diff --git a/drivers/platform/x86/huawei_wmi.c 
b/drivers/platform/x86/huawei_wmi.c
new file mode 100644
index ..83545217ac19
--- /dev/null
+++ b/drivers/platform/x86/huawei_wmi.c
@@ -0,0 +1,223 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *  Huawei WMI Hotkeys Driver
+ *
+ *  Copyright (C) 2018   Ayman Bagabas 
+ *
+ *  This program is free software: you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation, either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program.  If not, see .
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+MODULE_AUTHOR("Ayman Bagabas ");
+MODULE_DESCRIPTION("Huawei WMI hotkeys");
+MODULE_LICENSE("GPL");
+
+#define DEVICE_NAME "huawei"
+#define MODULE_NAME DEVICE_NAME"_wmi"
+
+/*
+ * Huawei WMI Devices GUIDs
+ */
+#define AMW0_GUID "ABBC0F5B-8EA1-11D1-A000-C9062910" // \_SB.AMW0
+
+/*
+ * Huawei WMI Events GUIDs
+ */
+#define EVENT_GUID "ABBC0F5C-8EA1-11D1-A000-C9062910"
+
+MODULE_ALIAS("wmi:"AMW0_GUID);
+MODULE_ALIAS("wmi:"EVENT_GUID);
+
+enum {
+   MICMUTE_LED_ON = 0x00010B04,
+   MICMUTE_LED_OFF = 0x0B04,
+};
+
+static const struct key_entry huawei_wmi_keymap[] __initconst = {
+   { KE_IGNORE, 0x281, { KEY_BRIGHTNESSDOWN } },
+   { KE_IGNORE, 0x282, { KEY_BRIGHTNESSUP } },
+   { KE_KEY,   0x287, { KEY_MICMUTE } },
+   { KE_KEY,   0x289, { KEY_WLAN } },
+   // Huawei |M| button
+   { KE_KEY,   0x28a, { KEY_PROG1 } },
+   { KE_END,   0 }
+};
+
+struct huawei_wmi_device {
+   struct input_dev *inputdev;
+};
+static struct huawei_wmi_device *wmi_device;
+
+int huawei_wmi_micmute_led_set(bool on)
+{
+   u32 args = (on) ? MICMUTE_LED_ON : MICMUTE_LED_OFF;
+   struct acpi_buffer input = { (acpi_size)sizeof(args), &args };
+   acpi_status status;
+
+   status = wmi_evaluate_method(AMW0_GUID, 0, 1, &input, NULL);
+   if (ACPI_FAILURE(status))
+   return status;
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(huawei_wmi_micmute_led_set);
+
+static void huawei_wmi_process_key(struct input_dev *input_dev, int code)
+{
+   const struct key_entry *key;
+
+   key = sparse_keymap_entry_from_scancode(input_dev, code);
+
+   if (!key) {
+   pr_info("%s: Unknown key pressed, code: 0x%04x\n",
+   MODULE_NAME, code);
+   return;
+   }
+
+   sparse_keymap_report_entry(input_de

[PATCH v2 0/3] Huawei laptops WMI & sound fixes

2018-11-01 Thread Ayman Bagabas

This patch set fixes some of the issues with Huawei laptops. 

[PATCH v2 1/3] 
The first patch adds support for missing hotkeys on some models. Some hotkeys,
like brightness keys, work out of the box on these models.

[PATCH v2 2/3]
This one enables the front speakers on the Huawei Matebook X Pro (MBXP). This
solves bug 200501 https://bugzilla.kernel.org/show_bug.cgi?id=200501
It simply uses the pins configurations generated by hdajackretast using the
settings posted on the bug page https://imgur.com/a/N1xsCVZ

[PATCH v2 3/3]
This enables the micmute LED on Huawei laptops. It calls an WMI method, using
PATCH #1, to turn the micmute LED on/off.

Ayman Bagabas (3):
  x86: add support for Huawei WMI hotkeys.
  ALSA: hda: fix front speakers on Huawei MBXP.
  ALSA: hda: add support for Huawei WMI MicMute LED

 drivers/platform/x86/Kconfig  |  13 ++
 drivers/platform/x86/Makefile |   1 +
 drivers/platform/x86/huawei_wmi.c | 224 ++
 include/linux/huawei_wmi.h|   7 +
 sound/pci/hda/huawei_wmi_helper.c |  48 +++
 sound/pci/hda/patch_realtek.c |  28 
 6 files changed, 321 insertions(+)
 create mode 100644 drivers/platform/x86/huawei_wmi.c
 create mode 100644 include/linux/huawei_wmi.h
 create mode 100644 sound/pci/hda/huawei_wmi_helper.c

-- 
2.17.2

[PATCH v2 3/3] ALSA: hda: add support for Huawei WMI MicMute LED

2018-11-01 Thread Ayman Bagabas

Some of Huawei laptops come with a LED in the mic mute key. This patch
enables and disable this LED accordingly.

Signed-off-by: Ayman Bagabas 
---
 drivers/platform/x86/huawei_wmi.c |  1 +
 include/linux/huawei_wmi.h|  7 +
 sound/pci/hda/huawei_wmi_helper.c | 48 +++
 sound/pci/hda/patch_realtek.c | 10 +++
 4 files changed, 66 insertions(+)
 create mode 100644 include/linux/huawei_wmi.h
 create mode 100644 sound/pci/hda/huawei_wmi_helper.c

diff --git a/drivers/platform/x86/huawei_wmi.c 
b/drivers/platform/x86/huawei_wmi.c
index 83545217ac19..cc5492571727 100644
--- a/drivers/platform/x86/huawei_wmi.c
+++ b/drivers/platform/x86/huawei_wmi.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 MODULE_AUTHOR("Ayman Bagabas ");
 MODULE_DESCRIPTION("Huawei WMI hotkeys");
diff --git a/include/linux/huawei_wmi.h b/include/linux/huawei_wmi.h
new file mode 100644
index ..69b656c5029b
--- /dev/null
+++ b/include/linux/huawei_wmi.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __HUAWEI_WMI_H__
+#define __HUAWEI_WMI_H__
+
+int huawei_wmi_micmute_led_set(bool on);
+
+#endif
diff --git a/sound/pci/hda/huawei_wmi_helper.c 
b/sound/pci/hda/huawei_wmi_helper.c
new file mode 100644
index ..77edb658cbf0
--- /dev/null
+++ b/sound/pci/hda/huawei_wmi_helper.c
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Helper functions for Huawei WMI Mic Mute LED;
+ * to be included from codec driver
+ */
+
+#if IS_ENABLED(CONFIG_HUAWEI_LAPTOP)
+#include 
+
+static int (*huawei_wmi_micmute_led_set_func)(bool);
+
+static void update_huawei_wmi_micmute_led(struct hda_codec *codec)
+{
+   struct hda_gen_spec *spec = codec->spec;
+
+   huawei_wmi_micmute_led_set_func(spec->micmute_led.led_value);
+}
+
+static void alc_fixup_huawei_wmi(struct hda_codec *codec,
+  const struct hda_fixup *fix, int action)
+{
+   bool removefunc = false;
+
+   if (action == HDA_FIXUP_ACT_PROBE) {
+   if (!huawei_wmi_micmute_led_set_func)
+   huawei_wmi_micmute_led_set_func = 
symbol_request(huawei_wmi_micmute_led_set);
+   if (!huawei_wmi_micmute_led_set_func) {
+   codec_warn(codec, "Failed to find huawei_wmi symbol 
huawei_wmi_micmute_led_set\n");
+   return;
+   }
+   removefunc = (huawei_wmi_micmute_led_set_func(false) < 0)
+   || (snd_hda_gen_add_micmute_led(codec, 
update_huawei_wmi_micmute_led) < 0);
+
+   }
+
+   if (huawei_wmi_micmute_led_set_func && (action == HDA_FIXUP_ACT_FREE || 
removefunc)) {
+   symbol_put(huawei_wmi_micmute_led_set);
+   huawei_wmi_micmute_led_set_func = NULL;
+   }
+}
+
+#else
+
+static void alc_fixup_huawei_wmi(struct hda_codec *codec,
+   const struct hda_fixup *fix, int action)
+{
+}
+
+#endif
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index 4f7a39c7883c..d09457d2a4f3 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -5374,6 +5374,9 @@ static void alc_fixup_thinkpad_acpi(struct hda_codec 
*codec,
 /* for alc295_fixup_hp_top_speakers */
 #include "hp_x360_helper.c"
 
+/* for alc_fixup_huawei_micmute_led */
+#include "huawei_wmi_helper.c"
+
 enum {
ALC269_FIXUP_SONY_VAIO,
ALC275_FIXUP_SONY_VAIO_GPIO2,
@@ -5494,6 +5497,7 @@ enum {
ALC255_FIXUP_DUMMY_LINEOUT_VERB,
ALC255_FIXUP_DELL_HEADSET_MIC,
ALC256_FIXUP_HUAWEI_MBXP_PINS,
+   ALC256_FIXUP_HUAWEI_WMI_MICMUTE_LED,
ALC295_FIXUP_HP_X360,
ALC221_FIXUP_HP_HEADSET_MIC,
 };
@@ -6348,6 +6352,10 @@ static const struct hda_fixup alc269_fixups[] = {
.chained = true,
.chain_id = ALC269_FIXUP_HEADSET_MIC
},
+   [ALC256_FIXUP_HUAWEI_WMI_MICMUTE_LED] = {
+   .type = HDA_FIXUP_FUNC,
+   .v.func = alc_fixup_huawei_wmi
+   },
[ALC256_FIXUP_HUAWEI_MBXP_PINS] = {
.type = HDA_FIXUP_PINS,
.v.pins = (const struct hda_pintbl[]) {
@@ -6363,6 +6371,8 @@ static const struct hda_fixup alc269_fixups[] = {
{0x21, 0x04211020},
{ },
},
+   .chained = true,
+   .chain_id = ALC256_FIXUP_HUAWEI_WMI_MICMUTE_LED
},
[ALC295_FIXUP_HP_X360] = {
.type = HDA_FIXUP_FUNC,
-- 
2.17.2

[PATCH v2 2/3] ALSA: hda: fix front speakers on Huawei MBXP.

2018-11-01 Thread Ayman Bagabas

This patch solves bug 200501 'Only 2 of 4 speakers playing sound.'
https://bugzilla.kernel.org/show_bug.cgi?id=200501
It enables the front speakers on Huawei Matebook X Pro laptops.
These laptops come with Dolby Atmos sound system and these pins
configuration enables the front speakers.

Signed-off-by: Ayman Bagabas 
---
 sound/pci/hda/patch_realtek.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index 3ac7ba9b342d..4f7a39c7883c 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -5493,6 +5493,7 @@ enum {
ALC298_FIXUP_TPT470_DOCK,
ALC255_FIXUP_DUMMY_LINEOUT_VERB,
ALC255_FIXUP_DELL_HEADSET_MIC,
+   ALC256_FIXUP_HUAWEI_MBXP_PINS,
ALC295_FIXUP_HP_X360,
ALC221_FIXUP_HP_HEADSET_MIC,
 };
@@ -6347,6 +6348,22 @@ static const struct hda_fixup alc269_fixups[] = {
.chained = true,
.chain_id = ALC269_FIXUP_HEADSET_MIC
},
+   [ALC256_FIXUP_HUAWEI_MBXP_PINS] = {
+   .type = HDA_FIXUP_PINS,
+   .v.pins = (const struct hda_pintbl[]) {
+   {0x12, 0x90a60130},
+   {0x13, 0x4000},
+   {0x14, 0x90170110},
+   {0x18, 0x41f0},
+   {0x19, 0x04a11040},
+   {0x1a, 0x41f0},
+   {0x1b, 0x90170112},
+   {0x1d, 0x40759a05},
+   {0x1e, 0x41f0},
+   {0x21, 0x04211020},
+   { },
+   },
+   },
[ALC295_FIXUP_HP_X360] = {
.type = HDA_FIXUP_FUNC,
.v.func = alc295_fixup_hp_top_speakers,
@@ -6592,6 +6609,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x17aa, 0x5109, "Thinkpad", 
ALC269_FIXUP_LIMIT_INT_MIC_BOOST),
SND_PCI_QUIRK(0x17aa, 0x511e, "Thinkpad", ALC298_FIXUP_TPT470_DOCK),
SND_PCI_QUIRK(0x17aa, 0x511f, "Thinkpad", ALC298_FIXUP_TPT470_DOCK),
+   SND_PCI_QUIRK(0x19e5, 0x3204, "Huawei MBXP", 
ALC256_FIXUP_HUAWEI_MBXP_PINS),
SND_PCI_QUIRK(0x17aa, 0x3bf8, "Quanta FL1", ALC269_FIXUP_PCM_44K),
SND_PCI_QUIRK(0x17aa, 0x9e54, "LENOVO NB", ALC269_FIXUP_LENOVO_EAPD),
SND_PCI_QUIRK(0x1b7d, 0xa831, "Ordissimo EVE2 ", 
ALC269VB_FIXUP_ORDISSIMO_EVE2), /* Also known as Malata PC-B1303 */
-- 
2.17.2

Re: [git pull] mount API series

2018-11-01 Thread Al Viro

On Thu, Nov 01, 2018 at 11:59:23PM +, David Howells wrote:

>  (*) mount-api-core.  These are the internal-only patches that add the
>  fs_context, the legacy wrapper and the security hooks and make certain
>  filesystems make use of it.

FWIW, while rereading that series I'd spotted something very odd in erofs.
It's orthogonal to everything else, but just to make sure it doesn't get
lost:
* sbi->dev_name thing in erofs is used only for debugging printks,
basically.  Just use sb->s_id[] and be done with that.
* dump struct erofs_mount_private - you don't need dev_name in
your erofs_fill_super().  Just use mount_bdev() in usual fashion.
* what the hell are you doing with ->s_root???  Why would you
possibly want it hashed and what kind of dcache lookup could find it?
That d_rehash() looks deeply confused; what are you trying to do there?

Re: [PATCH] memory_hotplug: cond_resched in __remove_pages

2018-11-01 Thread Balbir Singh

On Wed, Oct 31, 2018 at 01:58:40PM +0100, Michal Hocko wrote:
> From: Michal Hocko 
> 
> We have received a bug report that unbinding a large pmem (>1TB)
> can result in a soft lockup:
> [  380.339203] NMI watchdog: BUG: soft lockup - CPU#9 stuck for 23s! 
> [ndctl:4365]
> [...]
> [  380.339316] Supported: Yes
> [  380.339318] CPU: 9 PID: 4365 Comm: ndctl Not tainted 4.12.14-94.40-default 
> #1 SLE12-SP4
> [  380.339318] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS 
> SE5C620.86B.01.00.0833.051120182255 05/11/2018
> [  380.339319] task: 9cce7d4410c0 task.stack: be9eb1bc4000
> [  380.339325] RIP: 0010:__put_page+0x62/0x80
> [  380.339326] RSP: 0018:be9eb1bc7d30 EFLAGS: 0282 ORIG_RAX: 
> ff10
> [  380.339327] RAX: 40540081c0d3 RBX: eb8f03557200 RCX: 
> 63af4000
> [  380.339328] RDX: 0002 RSI: 9cce75bff498 RDI: 
> 9e4a76072ff8
> [  380.339329] RBP: 000a43557200 R08:  R09: 
> be9eb1bc7bb0
> [  380.339329] R10: be9eb1bc7d08 R11:  R12: 
> 9e194a22a0e0
> [  380.339330] R13: 9cce7062fc10 R14: 9e194a22a0a0 R15: 
> 9cce6559c0e0
> [  380.339331] FS:  7fd132368880() GS:9cce7ea4() 
> knlGS:
> [  380.339332] CS:  0010 DS:  ES:  CR0: 80050033
> [  380.339332] CR2: 020820a0 CR3: 00017ef7a003 CR4: 
> 007606e0
> [  380.339333] DR0:  DR1:  DR2: 
> 
> [  380.339334] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [  380.339334] PKRU: 5554
> [  380.339334] Call Trace:
> [  380.339338]  devm_memremap_pages_release+0x152/0x260
> [  380.339342]  release_nodes+0x18d/0x1d0
> [  380.339347]  device_release_driver_internal+0x160/0x210
> [  380.339350]  unbind_store+0xb3/0xe0
> [  380.339355]  kernfs_fop_write+0x102/0x180
> [  380.339358]  __vfs_write+0x26/0x150
> [  380.339363]  ? security_file_permission+0x3c/0xc0
> [  380.339364]  vfs_write+0xad/0x1a0
> [  380.339366]  SyS_write+0x42/0x90
> [  380.339370]  do_syscall_64+0x74/0x150
> [  380.339375]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [  380.339377] RIP: 0033:0x7fd13166b3d0
> 
> It has been reported on an older (4.12) kernel but the current upstream
> code doesn't cond_resched in the hot remove code at all and the given
> range to remove might be really large. Fix the issue by calling cond_resched
> once per memory section.
> 
> Signed-off-by: Michal Hocko 
> ---
>  mm/memory_hotplug.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 7e6509a53d79..1d87724fa558 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -587,6 +587,7 @@ int __remove_pages(struct zone *zone, unsigned long 
> phys_start_pfn,
>   for (i = 0; i < sections_to_remove; i++) {
>   unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
>  
> + cond_resched();
>   ret = __remove_section(zone, __pfn_to_section(pfn), map_offset,
>   altmap);
>   map_offset = 0;

Quick math tells me we're doing less than 44GiB's per second of offlining then?

Here is a quick untested patch that might help with the speed as well

In hot remove, we try to clear poisoned pages, but
a small optimization to check if num_poisoned_pages
is 0 helps remove the iteration through nr_pages.

NOTE: We can make num_poisoned_pages counter per
section and speed this up even more in case we
do have some poisoned pages

Signed-off-by: Balbir Singh 
---
 mm/sparse.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/sparse.c b/mm/sparse.c
index 33307fc05c4d..c4280ef0f383 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -724,6 +724,9 @@ static void clear_hwpoisoned_pages(struct page *memmap, int 
nr_pages)
if (!memmap)
return;
 
+   if (atomic_long_read(&num_poisoned_pages) == 0)
+   return;
+
for (i = 0; i < nr_pages; i++) {
if (PageHWPoison(&memmap[i])) {
atomic_long_sub(1, &num_poisoned_pages);

Anyway for this patch:
Acked-by: Balbir Singh

Re: [PATCH v1 4/7] vfio: ap: AP Queue Interrupt Control VFIO ioctl calls

2018-11-01 Thread kbuild test robot

Hi Pierre,

I love your patch! Yet something to improve:

[auto build test ERROR on s390/features]
[also build test ERROR on next-20181101]
[cannot apply to v4.19]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Pierre-Morel/s390-vfio-ap-Using-GISA-for-AP-Interrupt/20181102-010854
base:   https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
config: s390-allmodconfig (attached as .config)
compiler: s390x-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=s390 

Note: the 
linux-review/Pierre-Morel/s390-vfio-ap-Using-GISA-for-AP-Interrupt/20181102-010854
 HEAD 1235cf4914e223e3da89385619976de8eea4e9db builds fine.
  It only hurts bisectibility.

All errors (new ones prefixed by >>):

   drivers/s390/crypto/vfio_ap_ops.c: In function 'ap_ioctl_setirq':
>> drivers/s390/crypto/vfio_ap_ops.c:915:18: error: 'GAL_ISC' undeclared (first 
>> use in this function); did you mean 'MAX_ISC'?
 aqic_gisa.isc = GAL_ISC;
 ^~~
 MAX_ISC
   drivers/s390/crypto/vfio_ap_ops.c:915:18: note: each undeclared identifier 
is reported only once for each function it appears in

vim +915 drivers/s390/crypto/vfio_ap_ops.c

   897  
   898  static int ap_ioctl_setirq(struct ap_matrix_mdev *matrix_mdev,
   899 struct vfio_ap_aqic *parm)
   900  {
   901  struct aqic_gisa aqic_gisa = reg2aqic(0);
   902  struct kvm_s390_gisa *gisa = matrix_mdev->kvm->arch.gisa;
   903  struct ap_status ap_status = reg2status(0);
   904  unsigned long p;
   905  int ret = -1;
   906  int apqn;
   907  uint32_t gd;
   908  
   909  apqn = (int)(parm->cmd & 0x);
   910  
   911  gd = matrix_mdev->kvm->vcpus[0]->arch.sie_block->gd;
   912  if (gd & 0x01)
   913  aqic_gisa.f = 1;
   914  aqic_gisa.gisc = matrix_mdev->gisc;
 > 915  aqic_gisa.isc = GAL_ISC;
   916  aqic_gisa.ir = 1;
   917  aqic_gisa.gisao = gisa->next_alert >> 4;
   918  
   919  p = (unsigned long) page_address(matrix_mdev->map->page);
   920  p += (matrix_mdev->map->guest_addr & 0x0fff);
   921  
   922  ret = ap_host_aqic((uint64_t)apqn, aqic2reg(aqic_gisa), p);
   923  parm->status = ret;
   924  
   925  ap_status = reg2status(ret);
   926  return (ap_status.rc) ? -EIO : 0;
   927  }
   928  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

[PATCH] driver: input: fix UBSAN warning in input_defuzz_abs_event

2018-11-01 Thread liujian

syzkaller triggered a UBCAN warning:

[  196.188950] UBSAN: Undefined behaviour in drivers/input/input.c:62:23
[  196.188958] signed integer overflow:
[  196.188964] -2147483647 - 104 cannot be represented in type 'int [2]'
[  196.188973] CPU: 7 PID: 4763 Comm: syz-executor Not tainted
4.19.0-514.55.6.9.x86_64+ #7
[  196.188977] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  196.188979] Call Trace:
[  196.189001]  dump_stack+0x91/0xeb
[  196.189014]  ubsan_epilogue+0x9/0x7c
[  196.189020]  handle_overflow+0x1d7/0x22c
[  196.189028]  ? __ubsan_handle_negate_overflow+0x18f/0x18f
[  196.189038]  ? __mutex_lock+0x213/0x13f0
[  196.189053]  ? drop_futex_key_refs+0xa0/0xa0
[  196.189070]  ? __might_fault+0xef/0x1b0
[  196.189096]  input_handle_event+0xe1b/0x1290
[  196.189108]  input_inject_event+0x1d7/0x27e
[  196.189119]  evdev_write+0x2cf/0x3f0
[  196.189129]  ? evdev_pass_values+0xd40/0xd40
[  196.189157]  ? mark_held_locks+0x160/0x160
[  196.189171]  ? __vfs_write+0xe0/0x6c0
[  196.189175]  ? evdev_pass_values+0xd40/0xd40
[  196.189179]  __vfs_write+0xe0/0x6c0
[  196.189186]  ? kernel_read+0x130/0x130
[  196.189204]  ? _cond_resched+0x15/0x30
[  196.189214]  ? __inode_security_revalidate+0xb8/0xe0
[  196.189222]  ? selinux_file_permission+0x354/0x430
[  196.189233]  vfs_write+0x160/0x440
[  196.189242]  ksys_write+0xc1/0x190
[  196.189248]  ? __ia32_sys_read+0xb0/0xb0
[  196.189259]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[  196.189267]  ? do_syscall_64+0x22/0x4a0
[  196.189276]  do_syscall_64+0xa5/0x4a0
[  196.189287]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  196.189293] RIP: 0033:0x44e7c9
[  196.189299] Code: fc ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00

the syzkaller reproduce script(but can't reproduce it every time):

r0 = syz_open_dev$evdev(&(0x7f000100)='/dev/input/event#\x00', 0x2,
0x1)
write$binfmt_elf64(r0, &(0x7f000240)={{0x7f, 0x45, 0x4c, 0x46, 0x40,
0x2, 0x2, 0x, 0x374c, 0x3, 0x0, 0x8001, 0x103,
0x40, 0x22e, 0x26, 0x1, 0x38, 0x2, 0xa23, 0x1, 0x2}, [{0x6474e557, 0x5,
0x6, 0x2, 0x9, 0x9, 0x6c3, 0x1ff}], "", [[], [], [], []]}, 0x478)
ioctl$EVIOCGSW(0x, 0x8040451b, &(0x7f40)=""/7)
syz_open_dev$evdev(&(0x7f000100)='/dev/input/event#\x00', 0x2, 0x1)
r1 = syz_open_dev$evdev(&(0x7f000100)='/dev/input/event#\x00', 0x2,
0x1)
openat$smack_task_current(0xff9c,
&(0x7f40)='/proc/self/attr/current\x00', 0x2, 0x0)
ioctl$EVIOCSABS0(r1, 0x401845c0, &(0x7f00)={0x4, 0x1, 0x4,
0xd1, 0x81, 0x3})
eventfd(0x1ff)
syz_open_dev$evdev(&(0x7f000100)='/dev/input/event#\x00', 0x2,
0x200)
syz_open_dev$evdev(&(0x7f000100)='/dev/input/event#\x00', 0x2, 0x1)
syz_open_dev$evdev(&(0x7f000100)='/dev/input/event#\x00', 0x2, 0x1)
syz_open_dev$evdev(&(0x7f000100)='/dev/input/event#\x00', 0x2, 0x1)
syz_open_dev$evdev(&(0x7f000100)='/dev/input/event#\x00', 0x2, 0x1)

Typecast int to long to fix the issue.

Signed-off-by: liujian 
---
 drivers/input/input.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/input/input.c b/drivers/input/input.c
index 3304aaa..24615ef 100644
--- a/drivers/input/input.c
+++ b/drivers/input/input.c
@@ -59,14 +59,17 @@ static inline int is_event_supported(unsigned int code,
 static int input_defuzz_abs_event(int value, int old_val, int fuzz)
 {
if (fuzz) {
-   if (value > old_val - fuzz / 2 && value < old_val + fuzz / 2)
+   if (value > (long)old_val - fuzz / 2 &&
+   value < (long)old_val + fuzz / 2)
return old_val;
 
-   if (value > old_val - fuzz && value < old_val + fuzz)
-   return (old_val * 3 + value) / 4;
+   if (value > (long)old_val - fuzz &&
+   value < (long)old_val + fuzz)
+   return ((long)old_val * 3 + value) / 4;
 
-   if (value > old_val - fuzz * 2 && value < old_val + fuzz * 2)
-   return (old_val + value) / 2;
+   if (value > (long)old_val - fuzz * 2 &&
+   value < (long)old_val + fuzz * 2)
+   return ((long)old_val + value) / 2;
}
 
return value;
-- 
2.7.4

Re: [PATCH 1/2] CHROMIUM: ASoC: rt5663: Add documentation for power supply support

2018-11-01 Thread Cheng-yi Chiang

Sorry! I made a mistake in the title.
I will fix them and re-post.
On Thu, Nov 1, 2018 at 8:40 PM Cheng-Yi Chiang  wrote:
>
> rt5663 codec driver will support setting CPVDD and AVDD power supply
> from device tree.
>
> Signed-off-by: Cheng-Yi Chiang 
> ---
>  Documentation/devicetree/bindings/sound/rt5663.txt | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/sound/rt5663.txt 
> b/Documentation/devicetree/bindings/sound/rt5663.txt
> index 23386446c63d6..d4058dfde0392 100644
> --- a/Documentation/devicetree/bindings/sound/rt5663.txt
> +++ b/Documentation/devicetree/bindings/sound/rt5663.txt
> @@ -36,6 +36,9 @@ Optional properties:
>"realtek,impedance_sensing_num" is 2. It means that there are 2 ranges of
>impedance in the impedance sensing function.
>
> +- avdd-supply: Power supply for AVDD, providing 1.8V.
> +- cpvdd-supply: Power supply for CPVDD, providing 3.5V.
> +
>  Pins on the device (for linking into audio routes) for RT5663:
>
>* IN1P
> @@ -51,4 +54,6 @@ rt5663: codec@12 {
> compatible = "realtek,rt5663";
> reg = <0x12>;
> interrupts = <7 IRQ_TYPE_EDGE_FALLING>;
> +   avdd-supply = <&pp1800_a_alc5662>;
> +   cpvdd-supply = <&pp3500_a_alc5662>;
>  };
> --
> 2.19.1.568.g152ad8e336-goog
>

Re: [PATCH 4.9 23/35] x86/mm: Expand static page table for fixmap space

2018-11-01 Thread Feng Tang

Hi Ben,

On Thu, Nov 01, 2018 at 10:25:43PM +, Ben Hutchings wrote:
> On Thu, 2018-10-11 at 17:35 +0200, Greg Kroah-Hartman wrote:
> > 4.9-stable review patch.  If anyone has any objections, please let me know.
> > 
> > --
> > 
> > From: Feng Tang 
> > 
> > commit 05ab1d8a4b36ee912b7087c6da127439ed0a903e upstream.
> 
> This backport is incorrect.  The part that updated __startup_64() in
> arch/x86/kernel/head64.c was dropped, presumably because that function
> doesn't exist in 4.9.  However that seems to be an essential of the
> fix.  In 4.9 the startup_64 routine in arch/x86/kernel/head_64.S would
> need to be changed instead.
> 
> I also found that this introduces new boot-time warnings on some
> systems if CONFIG_DEBUG_WX is enabled.
> 
> So, unless someone provides fixes for those issues, I think this should
> be reverted for the 4.9 branch.

Thanks for the catch, I'm fine with the revert for now.

- Feng

linux-next: Tree for Nov 2

2018-11-01 Thread Stephen Rothwell

Hi all,

Please do not add any v4.21/v5.1 code to your linux-next included trees
until after the merge window closes.

Changes since 20181101:

Removed trees: hvc (finished with)

Non-merge commits (relative to Linus' tree): 628
 817 files changed, 36481 insertions(+), 8817 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 290 trees (counting Linus' and 66 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (7260935d71b6 Merge tag 'ovl-update-4.20' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs)
Merging fixes/master (7c6c54b505b8 Merge branch 'i2c/for-next' of 
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux)
Merging kbuild-current/fixes (310c7585e830 Merge tag 'nfsd-4.20' of 
git://linux-nfs.org/~bfields/linux)
Merging arc-current/for-curr (a75e410a8bc2 ARCv2: boot log unaligned access in 
use)
Merging arm-current/fixes (3a58ac65e2d7 ARM: 8799/1: mm: fix pci_ioremap_io() 
offset check)
Merging arm64-fixes/for-next/fixes (ca2b497253ad arm64: perf: Reject 
stand-alone CHAIN events for PMUv3)
Merging m68k-current/for-linus (58c116fb7dc6 m68k/sun3: Remove is_medusa and 
m68k_pgtable_cachemode)
Merging powerpc-fixes/fixes (84df9525b0c2 Linux 4.19)
Merging sparc/master (1f2b5b8e2df4 sparc64: Wire up compat getpeername and 
getsockname.)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (7de414a9dd91 net: drop skb on failure in ip_check_defrag())
Merging bpf/master (dfeb8f4c9692 Merge branch 'verifier-fixes')
Merging ipsec/master (533555e5cbb6 xfrm: Fix error return code in 
xfrm_output_one())
Merging netfilter/master (29a0dd66e953 netfilter: xt_IDLETIMER: add sysfs 
filename checking routine)
Merging ipvs/master (feb9f55c33e5 netfilter: nft_dynset: allow dynamic updates 
of non-anonymous set)
Merging wireless-drivers/master (3baafeffa48a iwlwifi: 1000: set the TFD queue 
size)
Merging mac80211/master (8d0be26c781a mac80211_hwsim: fix module init error 
paths for netlink)
Merging rdma-fixes/for-rc (a3671a4f973e RDMA/ucma: Fix Spectre v1 vulnerability)
Merging sound-current/for-linus (826b5de90c0b ALSA: firewire-lib: fix 
insufficient PCM rule for period/buffer size)
Merging sound-asoc-fixes/for-linus (fec2c565951e Merge branch 'asoc-4.19' into 
asoc-linus)
Merging regmap-fixes/for-linus (35a7f35ad1b1 Linux 4.19-rc8)
Merging regulator-fixes/for-linus (84df9525b0c2 Linux 4.19)
Merging spi-fixes/for-linus (83665c6da9d3 Merge branch 'spi-4.19' into 
spi-linus)
Merging pci-current/for-linus (2edab4df98d9 PCI: Expand the "PF" acronym in 
Kconfig help text)
Merging driver-core.current/driver-core-linus (310c7585e830 Merge tag 
'nfsd-4.20' of git://linux-nfs.org/~bfields/linux)
Merging tty.current/tty-linus (202dc3cc10b4 serial: sh-sci: Fix receive on 
SCIFA/SCIFB variants with DMA)
Merging usb.current/usb-linus (310c7585e830 Merge tag 'nfsd-4.20' of 
git://linux-nfs.org/~bfields/linux)
Merging usb-gadget-fixes/fixes (d9707490077b usb: dwc2: Fix call location of 
dwc2_check_core_endianness)
Merging usb-serial-fixes/usb-linus (0238df646e62 Linux 4.19-rc7)
Merging usb-chipidea-fixes/ci-for-usb-stable (a930d8bd94d8 usb: chipidea: 
Always build ULPI c

Re: Will the recent memory leak fixes be backported to longterm kernels?

2018-11-01 Thread Roman Gushchin

On Fri, Nov 02, 2018 at 02:45:42AM +, Dexuan Cui wrote:
> > From: Roman Gushchin 
> > Sent: Thursday, November 1, 2018 17:58
> > 
> > On Fri, Nov 02, 2018 at 12:16:02AM +, Dexuan Cui wrote:
> > Hello, Dexuan!
> > 
> > A couple of issues has been revealed recently, here are fixes
> > (hashes are from the next tree):
> > 
> > 5f4b04528b5f mm: don't reclaim inodes with many attached pages
> > 5a03b371ad6a mm: handle no memcg case in memcg_kmem_charge()
> > properly
> > 
> > These two patches should be added to the serie.
> 
> Thanks for the new info!
>  
> > Re stable backporting, I'd really wait for some time. Memory reclaim is a
> > quite complex and fragile area, so even if patches are correct by 
> > themselves,
> > they can easily cause a regression by revealing some other issues (as it was
> > with the inode reclaim case).
> 
> I totally agree. I'm now just wondering if there is any temporary workaround,
> even if that means we have to run the kernel with some features disabled or
> with a suboptimal performance?

I don't think there is any, except not using memory cgroups at all.
Limiting the amount of cgroups which are created and destroyed helps too:
a faulty service running under systemd can be especially painful.

Thanks!

[PATCH] arm64: dts: nxp: add more thermal zone support

2018-11-01 Thread Yuantian Tang

To enable all the supported thermal sensors, add sensor id information
to thermal zone node.
Dts for ls1012a, ls1046a, ls1043a, ls1088a are updated.

Signed-off-by: Yuantian Tang 
---
 arch/arm64/boot/dts/freescale/fsl-ls1012a.dtsi |   39 +++
 arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi |   59 +++
 arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi |   55 ++
 arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi |   54 ++
 4 files changed, 75 insertions(+), 132 deletions(-)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1012a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1012a.dtsi
index 68ac78c..9526b66 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1012a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1012a.dtsi
@@ -28,7 +28,7 @@
#address-cells = <1>;
#size-cells = <0>;
 
-   cpu0: cpu@0 {
+   cooling_map0: cpu0: cpu@0 {
device_type = "cpu";
compatible = "arm,cortex-a53";
reg = <0x0>;
@@ -100,36 +100,7 @@
mask = <0x02>;
};
 
-   thermal-zones {
-   cpu_thermal: cpu-thermal {
-   polling-delay-passive = <1000>;
-   polling-delay = <5000>;
-   thermal-sensors = <&tmu 0>;
-
-   trips {
-   cpu_alert: cpu-alert {
-   temperature = <85000>;
-   hysteresis = <2000>;
-   type = "passive";
-   };
-
-   cpu_crit: cpu-crit {
-   temperature = <95000>;
-   hysteresis = <2000>;
-   type = "critical";
-   };
-   };
-
-   cooling-maps {
-   map0 {
-   trip = <&cpu_alert>;
-   cooling-device =
-   <&cpu0 THERMAL_NO_LIMIT
-   THERMAL_NO_LIMIT>;
-   };
-   };
-   };
-   };
+   #include "fsl-tmu.dtsi"
 
soc {
compatible = "simple-bus";
@@ -506,3 +477,9 @@
};
};
 };
+
+&thermal_zones {
+   thermal-zone0 {
+   status = "okay";
+   };
+};
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
index 7881e3d..3afc6d4 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi
@@ -37,7 +37,7 @@
 *
 * Currently supported enable-method is psci v0.2
 */
-   cpu0: cpu@0 {
+   cooling_map0: cpu0: cpu@0 {
device_type = "cpu";
compatible = "arm,cortex-a53";
reg = <0x0>;
@@ -146,36 +146,7 @@
mask = <0x02>;
};
 
-   thermal-zones {
-   cpu_thermal: cpu-thermal {
-   polling-delay-passive = <1000>;
-   polling-delay = <5000>;
-
-   thermal-sensors = <&tmu 3>;
-
-   trips {
-   cpu_alert: cpu-alert {
-   temperature = <85000>;
-   hysteresis = <2000>;
-   type = "passive";
-   };
-   cpu_crit: cpu-crit {
-   temperature = <95000>;
-   hysteresis = <2000>;
-   type = "critical";
-   };
-   };
-
-   cooling-maps {
-   map0 {
-   trip = <&cpu_alert>;
-   cooling-device =
-   <&cpu0 THERMAL_NO_LIMIT
-   THERMAL_NO_LIMIT>;
-   };
-   };
-   };
-   };
+   #include "fsl-tmu.dtsi"
 
timer {
compatible = "arm,armv8-timer";
@@ -747,3 +718,29 @@
 
 #include "qoriq-qman-portals.dtsi"
 #include "qoriq-bman-portals.dtsi"
+
+&thermal_zones {
+   thermal-zone0 {
+   status = "okay";
+   };
+
+   thermal-zone1 {
+   status = "okay";
+   };
+
+   thermal-zone2 {
+   status = "okay";
+   };
+
+   thermal-zon

Re: [git pull] work.afs

2018-11-01 Thread Linus Torvalds

On Thu, Nov 1, 2018 at 4:46 PM Al Viro  wrote:
>
> AFS series, with some iov_iter bits included.

Grr. Bad summary explanation of what actually is happening.

Also, this is very late in the merge window for no discernible reason for this.

I'm not happy. I'm taking it, but I'm no longer pulling random stuff
that I get after this.

   Linus

[PATCH] pinctrl: mediatek: Fix dependencies for EINT_MTK

2018-11-01 Thread Olof Johansson

Fixes the following config-time warning:

WARNING: unmet direct dependencies detected for EINT_MTK
  Depends on [n]: PINCTRL [=y] && (ARCH_MEDIATEK [=y] || COMPILE_TEST [=n]) && 
(PINCTRL_MTK [=n] || PINCTRL_MTK_MOORE [=n] || COMPILE_TEST [=n])
  Selected by [y]:
  - PINCTRL_MTK_PARIS [=y] && PINCTRL [=y] && OF [=y] && (ARCH_MEDIATEK [=y] || 
COMPILE_TEST [=n])

Fixes: 805250982bb5 ("pinctrl: mediatek: add pinctrl-paris that implements the 
vendor dt-bindings")
Signed-off-by: Olof Johansson 
---
 drivers/pinctrl/mediatek/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pinctrl/mediatek/Kconfig b/drivers/pinctrl/mediatek/Kconfig
index 9d142e1da567..50efc9cc8ee7 100644
--- a/drivers/pinctrl/mediatek/Kconfig
+++ b/drivers/pinctrl/mediatek/Kconfig
@@ -3,7 +3,7 @@ menu "MediaTek pinctrl drivers"
 
 config EINT_MTK
bool "MediaTek External Interrupt Support"
-   depends on PINCTRL_MTK || PINCTRL_MTK_MOORE || COMPILE_TEST
+   depends on PINCTRL_MTK || PINCTRL_MTK_MOORE || PINCTRL_MTK_PARIS || 
COMPILE_TEST
select GPIOLIB
select IRQ_DOMAIN
 
-- 
2.11.0

Question: perf dso support for /proc/kallsyms

2018-11-01 Thread leo . yan

Hi all,

Now I found that if use the command 'perf script' for Arm CoreSight trace
data, it fails to parse kernel symbols if we don't specify kernel vmlinux
file.   So when we don't specify kernel symbol files then perf tool will
roll back to use /proc/kallsyms for kernel symbols parsing, as result it will
run into below flow:

  thread__find_addr_map(thread, cpumode, MAP__FUNCTION, address, &al);
  map__load(al.map);
  dso__data_read_offset(al.map->dso, machine, offset, buffer, size);
`-> data_read_offset()

I can observe the function data_read_offset() returns failure, this is caused
by checking the offset sanity "if (offset > dso->data.file_size)"  (I pasted
the whole function code at below in case you want to get more context for it),
but if perf use "/proc/kallsyms" to load kernel symbols, the variable
'dso->data.file_size' will be set to zero thus the sanity checking always
thinks the offset is out of the file size bound.

Now I still don't understand how the dso/map support "/proc/kallsyms" and
have no idea to fix this issue, though I spent some time to look into it.

Could you give some suggestion for this?  Or even better if you have fixing
for this, I am glad to test at my side.

static ssize_t data_read_offset(struct dso *dso, struct machine *machine,
u64 offset, u8 *data, ssize_t size)
{
if (data_file_size(dso, machine))
return -1;

/* Check the offset sanity. */
if (offset > dso->data.file_size)
return -1;

if (offset + size < offset)
return -1;

return cached_read(dso, machine, offset, data, size);
}

Thanks,
Leo Yan

RE: Will the recent memory leak fixes be backported to longterm kernels?

2018-11-01 Thread Dexuan Cui

> From: Roman Gushchin 
> Sent: Thursday, November 1, 2018 17:58
> 
> On Fri, Nov 02, 2018 at 12:16:02AM +, Dexuan Cui wrote:
> Hello, Dexuan!
> 
> A couple of issues has been revealed recently, here are fixes
> (hashes are from the next tree):
> 
> 5f4b04528b5f mm: don't reclaim inodes with many attached pages
> 5a03b371ad6a mm: handle no memcg case in memcg_kmem_charge()
> properly
> 
> These two patches should be added to the serie.

Thanks for the new info!
 
> Re stable backporting, I'd really wait for some time. Memory reclaim is a
> quite complex and fragile area, so even if patches are correct by themselves,
> they can easily cause a regression by revealing some other issues (as it was
> with the inode reclaim case).

I totally agree. I'm now just wondering if there is any temporary workaround,
even if that means we have to run the kernel with some features disabled or
with a suboptimal performance?

Thanks!
--Dexuan

[PATCH v2] ARM:kexec:offline panic_smp_self_stop CPU

2018-11-01 Thread wangyufen

In case panic() and panic() called at the same time on different CPUS.
For example:
CPU 0:
  panic()
 __crash_kexec
   machine_crash_shutdown
 crash_smp_send_stop
   machine_kexec
 BUG_ON(num_online_cpus() > 1);

CPU 1:
  panic()
local_irq_disable
panic_smp_self_stop

If CPU 1 calls panic_smp_self_stop() before crash_smp_send_stop(), kdump
fails. CPU1 can't receive the ipi irq, CPU1 will be always online.
To fix this problem, this patch split out the panic_smp_self_stop()
and add set_cpu_online(smp_processor_id(), false).

Signed-off-by: Yufen Wang 
---
 arch/arm/kernel/smp.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 9000d8b..d7b86e4 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -682,6 +682,21 @@ void smp_send_stop(void)
pr_warn("SMP: failed to stop secondary CPUs\n");
 }
 
+/* In case panic() and panic() called at the same time on CPU1 and CPU2,
+ * and CPU 1 calls panic_smp_self_stop() before crash_smp_send_stop()
+ * CPU1 can't receive the ipi irqs from CPU2, CPU1 will be always online,
+ * kdump fails. So split out the panic_smp_self_stop() and add
+ * set_cpu_online(smp_processor_id(), false).
+ */
+void panic_smp_self_stop(void)
+{
+   pr_debug("CPU %u will stop doing anything useful since another CPU has 
paniced\n",
+smp_processor_id());
+   set_cpu_online(smp_processor_id(), false);
+   while (1)
+   cpu_relax();
+}
+
 /*
  * not supported here
  */
-- 
2.7.4

[GIT PULL] RISC-V Patches for the 4.20 Merge Window, Part 3

2018-11-01 Thread Palmer Dabbelt

The following changes since commit baa888d25ea64d0c59344d474284ca99cfdd449a:

  Merge branch 'next-keys2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 
(2018-11-01 15:23:59 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux.git 
tags/riscv-for-linus-4.20-mw3

for you to fetch changes up to ba1f0d95576902c10930d3467e638bac38f942f1:

  RISC-V: refresh defconfig (2018-11-01 17:04:07 -0700)


RISC-V Patches for the 4.20 Merge Window, Part 3

Sorry for the last minute patches, but it was suggested we try to push
this in before rc1 to make it easier for people to keep their branch
rebases sane.  Since this is just a single defconfig update that is
intended to have no functional change I thought it would be worth
breaking my own PR rules.


Anup Patel (1):
  RISC-V: refresh defconfig

 arch/riscv/configs/defconfig | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

[PATCH] ASoC: wm8996: fix small typo

2018-11-01 Thread sh liu

>From e9b923690675ca8fa883fd25dcead5b457856735 Mon Sep 17 00:00:00 2001
From: liush 
Date: Fri, 2 Nov 2018 08:57:00 +0800
Subject: [PATCH] ASoC: wm8996: fix small typo

atleast -> at least

Change-Id: Icc970b438166daef13518b7d1a62b13eb8752f5f
Signed-off-by: liush 
---
 sound/soc/codecs/wm8996.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/codecs/wm8996.c b/sound/soc/codecs/wm8996.c
index 8affa49..d039835 100644
--- a/sound/soc/codecs/wm8996.c
+++ b/sound/soc/codecs/wm8996.c
@@ -2109,7 +2109,7 @@ static int wm8996_set_fll(struct snd_soc_codec
*codec, int fll_id, int source,
  if (i2c->irq)
  timeout *= 10;
  else
- /* ensure timeout of atleast 1 jiffies */
+ /* ensure timeout of at least 1 jiffies */
  timeout = timeout/2 ? : 1;

  for (retry = 0; retry < 10; retry++) {
-- 
1.9.1

[PATCH] ASoC: wm8996: fix small typo

2018-11-01 Thread lshua312

From e9b923690675ca8fa883fd25dcead5b457856735 Mon Sep 17 00:00:00 2001
From: liush 
Date: Fri, 2 Nov 2018 08:57:00 +0800
Subject: [PATCH] ASoC: wm8996: fix small typo

atleast -> at least

Change-Id: Icc970b438166daef13518b7d1a62b13eb8752f5f
Signed-off-by: liush 
---
 sound/soc/codecs/wm8996.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/codecs/wm8996.c b/sound/soc/codecs/wm8996.c
index 8affa49..d039835 100644
--- a/sound/soc/codecs/wm8996.c
+++ b/sound/soc/codecs/wm8996.c
@@ -2109,7 +2109,7 @@ static int wm8996_set_fll(struct snd_soc_codec *codec, 
int fll_id, int source,
if (i2c->irq)
timeout *= 10;
else
-   /* ensure timeout of atleast 1 jiffies */
+   /* ensure timeout of at least 1 jiffies */
timeout = timeout/2 ? : 1;
 
for (retry = 0; retry < 10; retry++) {
-- 
1.9.1

Re: [PATCH] ARM:kexec:offline panic_smp_self_stop CPU

2018-11-01 Thread wangyufen

On 2018/11/1 19:34, Russell King - ARM Linux wrote:
> On Thu, Nov 01, 2018 at 07:20:49PM +0800, Wang Yufen wrote:
>> From: Yufen Wang 
>>
>> In case panic() and panic() called at the same time on different CPUS.
>> For example:
>> CPU 0:
>>   panic()
>>  __crash_kexec
>>machine_crash_shutdown
>>  crash_smp_send_stop
>>machine_kexec
>>  BUG_ON(num_online_cpus() > 1);
>>
>> CPU 1:
>>   panic()
>> local_irq_disable
>> panic_smp_self_stop
>>
>> If CPU 1 calls panic_smp_self_stop() before crash_smp_send_stop(), kdump
>> fails. CPU1 can't receive the ipi irq, CPU1 will be always online.
>> I changed BUG_ON to WARN in kexec crash as arm64 does, kdump also fails.
>> Because num_online_cpus() > 1, can't disable the L2 in _soft_restart.
>> To fix this problem, this patch split out the panic_smp_self_stop()
>> and add set_cpu_online(smp_processor_id(), false).
> Thanks.
>
> I think this may as well go into arch/arm/kernel/smp.c - it won't be
> required for single-CPU systems, since there aren't "other" CPUs.
>
> It's probably also worth a comment above the function as to why we
> have this.

Thanks.

I will send v2.

>> Signed-off-by: Yufen Wang 
>> ---
>>  arch/arm/kernel/setup.c | 10 ++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
>> index 31940bd..151861f 100644
>> --- a/arch/arm/kernel/setup.c
>> +++ b/arch/arm/kernel/setup.c
>> @@ -602,6 +602,16 @@ static void __init smp_build_mpidr_hash(void)
>>  }
>>  #endif
>>  
>> +void panic_smp_self_stop(void)
>> +{
>> +printk(KERN_DEBUG "CPU %u will stop doing anything useful since another 
>> CPU has paniced\n",
>> +smp_processor_id());
>> +set_cpu_online(smp_processor_id(), false);
>> +while (1)
>> +cpu_relax();
>> +
>> +}
>> +
>>  static void __init setup_processor(void)
>>  {
>>  struct proc_info_list *list;
>> -- 
>> 2.7.4
>>
>>

Re: Will the recent memory leak fixes be backported to longterm kernels?

2018-11-01 Thread Roman Gushchin

On Fri, Nov 02, 2018 at 12:16:02AM +, Dexuan Cui wrote:
> Hi all,
> When debugging a memory leak issue 
> (https://github.com/coreos/bugs/issues/2516)
> with v4.14.11-coreos, we noticed the same issue may have been fixed recently 
> by
> Roman in the latest mainline (i.e. Linus's master branch) according to 
> comment #7 of 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.launchpad.net_ubuntu_-2Bsource_linux_-2Bbug_1792349&d=DwIFAg&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=mrT9jcrhFvVxDpVBlxihJg6S6U91rlevOJby7y1YynE&s=1eHLVA-oQGqMd2ujRPU8kZMbkShOuIDD5CUgpM1IzGI&e=,
>  which lists these
> patches (I'm not sure if the 5-patch list is complete):
> 
> 010cb21d4ede math64: prevent double calculation of DIV64_U64_ROUND_UP() 
> arguments
> f77d7a05670d mm: don't miss the last page because of round-off error
> d18bf0af683e mm: drain memcg stocks on css offlining
> 71cd51b2e1ca mm: rework memcg kernel stack accounting
> f3a2fccbce15 mm: slowly shrink slabs with a relatively small number of objects
> 
> Obviously at least some of the fixes are also needed in the longterm kernels 
> like v4.14.y,
> but none of the 5 patches has the "Cc: sta...@vger.kernel.org" tag? I'm 
> wondering if
> these patches will be backported to the longterm kernels. BTW, the patches 
> are not
> in v4.19, but I suppose they will be in v4.19.1-rc1?

Hello, Dexuan!

A couple of issues has been revealed recently, here are fixes
(hashes are from the next tree):

5f4b04528b5f mm: don't reclaim inodes with many attached pages
5a03b371ad6a mm: handle no memcg case in memcg_kmem_charge() properly

These two patches should be added to the serie.

Re stable backporting, I'd really wait for some time. Memory reclaim is a
quite complex and fragile area, so even if patches are correct by themselves,
they can easily cause a regression by revealing some other issues (as it was
with the inode reclaim case).

Thanks!

[PATCH v3] x86/kvmclock : convert to SPDX identifiers

2018-11-01 Thread Peng Hao

Update the verbose license text with the matching SPDX 
license identifier.

Signed-off-by: Peng Hao 
---
 arch/x86/kernel/kvmclock.c | 15 +--
 1 files changed, 1 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 1e67646..a59325e 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -1,19 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0+
 /*  KVM paravirtual clock driver. A clocksource implementation
 Copyright (C) 2008 Glauber de Oliveira Costa, Red Hat Inc.
-
-This program is free software; you can redistribute it and/or modify
-it under the terms of the GNU General Public License as published by
-the Free Software Foundation; either version 2 of the License, or
-(at your option) any later version.
-
-This program is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU General Public License for more details.
-
-You should have received a copy of the GNU General Public License
-along with this program; if not, write to the Free Software
-Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
 */
 
 #include 
-- 
1.8.3.1

Re: [LKP] [sunrpc] 6a7da2a288: kernel_BUG_at_lib/iov_iter.c

2018-11-01 Thread David Howells

kernel test robot  wrote:

> FYI, we noticed the following commit (built with gcc-7):
> 
> commit: 6a7da2a288ce412d7ac117a2912a7b0d9104ee6d ("[RFC] sunrpc: Fix flood of 
> warnings from iov_iter_kvec in linux-next")
> url: 
> https://github.com/0day-ci/linux/commits/Leonard-Crestez/sunrpc-Fix-flood-of-warnings-from-iov_iter_kvec-in-linux-next/20181101-070713
> base: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
> 
> in testcase: boot
> 
> on test machine: qemu-system-x86_64 -enable-kvm -cpu kvm64,+ssse3 -smp 2 -m 8G
> 
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):

Ummm...  You can't just apply that commit to Trond's linux-next branch unless
that branch also includes the iov_iter changes from my afs-next branch.

Before those changes, ITER_KVEC is required:

BUG_ON(!(direction & ITER_KVEC));

and after, it will be prohibited:

WARN_ON(direction & ~(READ | WRITE));

The reason for this is that have yet more patches that split the direction
from the iov_iter::type member into their own member and turn the types into a
simple integer sequence instead of a bit mask.

David

Re: Will the recent memory leak fixes be backported to longterm kernels?

2018-11-01 Thread Sasha Levin


On Fri, Nov 02, 2018 at 12:16:02AM +, Dexuan Cui wrote:

Hi all,
When debugging a memory leak issue (https://github.com/coreos/bugs/issues/2516)
with v4.14.11-coreos, we noticed the same issue may have been fixed recently by
Roman in the latest mainline (i.e. Linus's master branch) according to comment 
#7 of
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1792349, which lists these
patches (I'm not sure if the 5-patch list is complete):

010cb21d4ede math64: prevent double calculation of DIV64_U64_ROUND_UP() 
arguments
f77d7a05670d mm: don't miss the last page because of round-off error
d18bf0af683e mm: drain memcg stocks on css offlining
71cd51b2e1ca mm: rework memcg kernel stack accounting
f3a2fccbce15 mm: slowly shrink slabs with a relatively small number of objects

Obviously at least some of the fixes are also needed in the longterm kernels 
like v4.14.y,
but none of the 5 patches has the "Cc: sta...@vger.kernel.org" tag? I'm 
wondering if
these patches will be backported to the longterm kernels. BTW, the patches are 
not
in v4.19, but I suppose they will be in v4.19.1-rc1?


There was an issue with this series:
https://lkml.org/lkml/2018/10/23/586, so it's waiting on a fix to be
properly tested.

--
Thanks,
Sasha

Re: [PATCH v4] mm/page_owner: clamp read count to PAGE_SIZE

2018-11-01 Thread William Kucharski

> On Nov 1, 2018, at 3:47 PM, Andrew Morton  wrote:
> 
> - count = count > PAGE_SIZE ? PAGE_SIZE : count;
> + count = min_t(size_t, count, PAGE_SIZE);
>   kbuf = kmalloc(count, GFP_KERNEL);
>   if (!kbuf)
>   return -ENOMEM;

Is the use of min_t vs. the C conditional mostly to be more self-documenting?

The compiler-generated assembly between the two versions seems mostly a wash.

William Kucharski

Re: [RFC 0/2] RISC-V: A proposal to add vendor-specific code

2018-11-01 Thread Alan Kao

On Thu, Nov 01, 2018 at 10:50:04AM -0700, Palmer Dabbelt wrote:
> On Wed, 31 Oct 2018 17:55:42 PDT (-0700), alan...@andestech.com wrote:
> >On Wed, Oct 31, 2018 at 07:17:45AM -0700, Christoph Hellwig wrote:
> >>On Wed, Oct 31, 2018 at 04:46:10PM +0530, Anup Patel wrote:
> >>> I agree that we need a place for vendor-specific ISA extensions and
> >>> having vendor-specific directories is also good.
> >>
> >>The only sensible answer is that we should not allow vendor specific
> >>extensions in the kernel at all.  ...
> >
> >How can this even be possible if a extension includes an extra register
> >set as some domain-specific context?  In such a case, kernel should
> >at least process the context during any context switch, just like how it
> >deals with the FP context.
> 
> Ya, I think there are cases where vendor-specific extensions are going to be
> necessary to handle within the kernel.  Right now the only one I can think
> of is the performance counter stuff, where we explicitly allow
> vendor-specific counters as part of the ISA spec.
> 
> For stateful extensions, we currently have a standard mechanism where the XS
> bits get set in sstatus and the actual save/restore code is hidden behind an
> SBI call.  That call doesn't currently exist, but if we just go ahead and
> add one it should be easy to support this from within Linux.  We'll need to
> figure out how to enable these custom extensions from userspace, but that
> seems tractable as well.  We'll probably also want some fast-path for the V
> extension (and any other stateful standard extensions), but I think as long
> as the V extension adds a quick check for dirtiness then it's not a big
> deal.
> 
> Do you guys have stateful extensions?  We're trying really hard to avoid
> them at SiFive because they're a huge headache, so unless there's a
> compelling base of software using one I don't want to go add support if we
> can avoid it.

Currently no, but the future is hard to see.  As long as the extensible freedom
claimed by the RISC-V foundation remains true, such extensions may have their
role to play.  Don't worry now, I was just to give a example that in some 
possible vendor-specific cases the kernel cannot keep itself from involving.

Re: [PATCH] Make JFFS2 endianness configurable

2018-11-01 Thread Al Viro

On Thu, Nov 01, 2018 at 05:02:36PM -0700, Daniel Walker wrote:
> 
> 
> 
> On Thu, Nov 01, 2018 at 03:56:03PM -0700, Nikunj Kela wrote:
> > This patch allows the endianness of the JFSS2 filesystem to be
> > specified by config options.
> > 
> > It defaults to native-endian (the previously hard-coded option).
> > 
> > Some architectures benefit from having a single known endianness
> > of JFFS2 filesystem (for data, not executables) independent of the
> > endianness of the processor (ARM processors can be switched to either
> > endianness at run-time).
> > 
> 
> 
> The description is pretty sad .. We have a product which we released that uses
> JFFS2, and that product was release with a kernel in one endianness. Then 
> later
> on we decided to change the endianness and now we're stuck with a JFFS2
> partition that has the wrong endiannes, in a released product. This patch 
> allows
> us to set the endianness to something different from the architecture setting.
> 
> So there a significant use case for the change, at least for Cisco.

FWIW, can't we detect it at mount time, as e.g. UFS does?

[PATCH v3] genirq/matrix: Choose CPU for managed IRQs based on how many of them are allocated

2018-11-01 Thread Long Li

From: Long Li 

On a large system with multiple devices of the same class (e.g. NVMe disks,
using managed IRQs), the kernel tends to concentrate their IRQs on several
CPUs.

The issue is that when NVMe calls irq_matrix_alloc_managed(), the assigned
CPU tends to be the first several CPUs in the cpumask, because they check for
cpumap->available that will not change after managed IRQs are reserved.

For a managed IRQ, it tends to reserve more than one CPU, based on cpumask in
irq_matrix_reserve_managed. But later when actually allocating CPU for this
IRQ, only one CPU is allocated. Because "available" is calculated at the time
managed IRQ is reserved, it tends to indicate a CPU has more IRQs than the 
actual
number it's assigned.

To get a more even distribution for allocating managed IRQs, we need to keep 
track
of how many of them are allocated on a given CPU. Introduce "managed_allocated"
in struct cpumap to track those managed IRQs that are allocated on this CPU, and
change the code to use this information for deciding how to allocate CPU for
managed IRQs.

Signed-off-by: Long Li 
---
 kernel/irq/matrix.c | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
index 6e6d467f3dec..94dd173f24d6 100644
--- a/kernel/irq/matrix.c
+++ b/kernel/irq/matrix.c
@@ -14,6 +14,7 @@ struct cpumap {
unsigned intavailable;
unsigned intallocated;
unsigned intmanaged;
+   unsigned intmanaged_allocated;
boolinitialized;
boolonline;
unsigned long   alloc_map[IRQ_MATRIX_SIZE];
@@ -145,6 +146,27 @@ static unsigned int matrix_find_best_cpu(struct irq_matrix 
*m,
return best_cpu;
 }
 
+/* Find the best CPU which has the lowest number of managed IRQs allocated */
+static unsigned int matrix_find_best_cpu_managed(struct irq_matrix *m,
+   const struct cpumask *msk)
+{
+   unsigned int cpu, best_cpu, allocated = UINT_MAX;
+   struct cpumap *cm;
+
+   best_cpu = UINT_MAX;
+
+   for_each_cpu(cpu, msk) {
+   cm = per_cpu_ptr(m->maps, cpu);
+
+   if (!cm->online || cm->managed_allocated > allocated)
+   continue;
+
+   best_cpu = cpu;
+   allocated = cm->managed_allocated;
+   }
+   return best_cpu;
+}
+
 /**
  * irq_matrix_assign_system - Assign system wide entry in the matrix
  * @m: Matrix pointer
@@ -269,7 +291,7 @@ int irq_matrix_alloc_managed(struct irq_matrix *m, const 
struct cpumask *msk,
if (cpumask_empty(msk))
return -EINVAL;
 
-   cpu = matrix_find_best_cpu(m, msk);
+   cpu = matrix_find_best_cpu_managed(m, msk);
if (cpu == UINT_MAX)
return -ENOSPC;
 
@@ -282,6 +304,7 @@ int irq_matrix_alloc_managed(struct irq_matrix *m, const 
struct cpumask *msk,
return -ENOSPC;
set_bit(bit, cm->alloc_map);
cm->allocated++;
+   cm->managed_allocated++;
m->total_allocated++;
*mapped_cpu = cpu;
trace_irq_matrix_alloc_managed(bit, cpu, m, cm);
-- 
2.14.1

Re: [PATCH v2] sched/core: Introduce set_next_task() helper for better code readability

2018-11-01 Thread Muchun Song

Hi, Peter

Thanks for your review.

Just update commit message. So there is no difference between them on the code.

Yours,
Muchun Song

Peter Zijlstra  于2018年11月2日周五 上午12:52写道：
>
>
>
> What if anything is the difference with v1 (which I found yesterday and
> have pending testing).

Will the recent memory leak fixes be backported to longterm kernels?

2018-11-01 Thread Dexuan Cui

Hi all,
When debugging a memory leak issue (https://github.com/coreos/bugs/issues/2516)
with v4.14.11-coreos, we noticed the same issue may have been fixed recently by
Roman in the latest mainline (i.e. Linus's master branch) according to comment 
#7 of 
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1792349, which lists these
patches (I'm not sure if the 5-patch list is complete):

010cb21d4ede math64: prevent double calculation of DIV64_U64_ROUND_UP() 
arguments
f77d7a05670d mm: don't miss the last page because of round-off error
d18bf0af683e mm: drain memcg stocks on css offlining
71cd51b2e1ca mm: rework memcg kernel stack accounting
f3a2fccbce15 mm: slowly shrink slabs with a relatively small number of objects

Obviously at least some of the fixes are also needed in the longterm kernels 
like v4.14.y,
but none of the 5 patches has the "Cc: sta...@vger.kernel.org" tag? I'm 
wondering if
these patches will be backported to the longterm kernels. BTW, the patches are 
not
in v4.19, but I suppose they will be in v4.19.1-rc1?

Thanks,
-- Dexuan

Re: [PATCH] Make JFFS2 endianness configurable

2018-11-01 Thread Daniel Walker

On Thu, Nov 01, 2018 at 03:56:03PM -0700, Nikunj Kela wrote:
> This patch allows the endianness of the JFSS2 filesystem to be
> specified by config options.
> 
> It defaults to native-endian (the previously hard-coded option).
> 
> Some architectures benefit from having a single known endianness
> of JFFS2 filesystem (for data, not executables) independent of the
> endianness of the processor (ARM processors can be switched to either
> endianness at run-time).
> 

The description is pretty sad .. We have a product which we released that uses
JFFS2, and that product was release with a kernel in one endianness. Then later
on we decided to change the endianness and now we're stuck with a JFFS2
partition that has the wrong endiannes, in a released product. This patch allows
us to set the endianness to something different from the architecture setting.

So there a significant use case for the change, at least for Cisco.

Daniel

Re: [PATCH v4] mm/page_owner: clamp read count to PAGE_SIZE

2018-11-01 Thread Matthew Wilcox

On Thu, Nov 01, 2018 at 04:30:12PM -0700, Joe Perches wrote:
> On Thu, 2018-11-01 at 14:47 -0700, Andrew Morton wrote:
> > +++ a/mm/page_owner.c
> > @@ -351,7 +351,7 @@ print_page_owner(char __user *buf, size_
> > .skip = 0
> > };
> >  
> > -   count = count > PAGE_SIZE ? PAGE_SIZE : count;
> > +   count = min_t(size_t, count, PAGE_SIZE);
> > kbuf = kmalloc(count, GFP_KERNEL);
> > if (!kbuf)
> > return -ENOMEM;
> 
> A bit tidier still might be
> 
>   if (count > PAGE_SIZE)
>   count = PAGE_SIZE;
> 
> as that would not always cause a write back to count.

90% chance 'count' is already in a register and will stay there.  99.9%
chance that if it's not in a register, it's on the top of the stack,
which is by definition a hot, local, dirty cacheline.

What you're saying makes sense for a struct which might well be in a
shared cacheline state.  But for a function-local variable?  No.

Re: [git pull] mount API series

2018-11-01 Thread David Howells

Linus Torvalds  wrote:

> So if the patch series can be split up into a prep-phase that doesn't
> change any user-visible semantics (including the security side), but
> that uses the fs_context internally and allows the filesystems to be
> converted to the new world order, than that would make merging the
> early work much easier (and then my worry about the later phases would
> probably be much less too).

As a first go, I've rebased the patches to v4.19 (which required no other
changes), folded in some small bugfixes (fix error handling in remount, fix
incorrect user_ns in proc and mqueue) and split the set up.

There are now three branches in my git tree:

 (*) mount-api-core.  These are the internal-only patches that add the
 fs_context, the legacy wrapper and the security hooks and make certain
 filesystems make use of it.

 (*) mount-api-uapi.  This is mount-api-core with the UAPI-visible patches
 stacked thereon.

 (*) mount-api.  This is the original patchset.

"git diff mount-api mount-api-uapi" shows no differences.

Note that the commit "vfs: Implement logging through fs_context" appears in
both sets.  I was just going to leave it as macros that just wrap pr_notice(),
but I think it might be wiser to pull it out of line (as will be required
later) and make it produce messages at different levels.

The git tree in question is at:

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git

David

Re: [PATCH] clk: fixed-factor: fix of_node_get-put imbalance

2018-11-01 Thread Stephen Boyd

Quoting Ricardo Ribalda Delgado (2018-11-01 06:15:49)
> When the fixed factor clock is created by devicetree,
> of_clk_add_provider is called.  Add a call to
> of_clk_del_provider in the remove function to balance
> it out.
> 
> Reported-by: Alan Tull 
> Fixes: 971451b3b15d ("clk: fixed-factor: Convert into a module platform 
> driver")
> Signed-off-by: Ricardo Ribalda Delgado 
> ---

Looks good. I'll queue this up for clk-fixes next week.

[PATCH 2/8] pstore: Do not use crash buffer for decompression

2018-11-01 Thread Kees Cook

The pre-allocated compression buffer used for crash dumping was also
being used for decompression. This isn't technically safe, since it's
possible the kernel may attempt a crashdump while pstore is populating the
pstore filesystem (and performing decompression). Instead, just allocate
a separate buffer for decompression. Correctness is preferred over
performance here.

Signed-off-by: Kees Cook 
---
 fs/pstore/platform.c | 56 
 1 file changed, 25 insertions(+), 31 deletions(-)

diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
index b821054ca3ed..8b6028948cf3 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -258,20 +258,6 @@ static int pstore_compress(const void *in, void *out,
return outlen;
 }
 
-static int pstore_decompress(void *in, void *out,
-unsigned int inlen, unsigned int outlen)
-{
-   int ret;
-
-   ret = crypto_comp_decompress(tfm, in, inlen, out, &outlen);
-   if (ret) {
-   pr_err("crypto_comp_decompress failed, ret = %d!\n", ret);
-   return ret;
-   }
-
-   return outlen;
-}
-
 static void allocate_buf_for_compression(void)
 {
struct crypto_comp *ctx;
@@ -656,8 +642,9 @@ EXPORT_SYMBOL_GPL(pstore_unregister);
 
 static void decompress_record(struct pstore_record *record)
 {
+   int ret;
int unzipped_len;
-   char *decompressed;
+   char *unzipped, *workspace;
 
if (!record->compressed)
return;
@@ -668,35 +655,42 @@ static void decompress_record(struct pstore_record 
*record)
return;
}
 
-   /* No compression method has created the common buffer. */
+   /* Missing compression buffer means compression was not initialized. */
if (!big_oops_buf) {
-   pr_warn("no decompression buffer allocated\n");
+   pr_warn("no decompression method initialized!\n");
return;
}
 
-   unzipped_len = pstore_decompress(record->buf, big_oops_buf,
-record->size, big_oops_buf_sz);
-   if (unzipped_len <= 0) {
-   pr_err("decompression failed: %d\n", unzipped_len);
+   /* Allocate enough space to hold max decompression and ECC. */
+   unzipped_len = big_oops_buf_sz;
+   workspace = kmalloc(unzipped_len + record->ecc_notice_size,
+   GFP_KERNEL);
+   if (!workspace)
return;
-   }
 
-   /* Build new buffer for decompressed contents. */
-   decompressed = kmalloc(unzipped_len + record->ecc_notice_size,
-  GFP_KERNEL);
-   if (!decompressed) {
-   pr_err("decompression ran out of memory\n");
+   /* After decompression "unzipped_len" is almost certainly smaller. */
+   ret = crypto_comp_decompress(tfm, record->buf, record->size,
+ workspace, &unzipped_len);
+   if (ret) {
+   pr_err("crypto_comp_decompress failed, ret = %d!\n", ret);
+   kfree(workspace);
return;
}
-   memcpy(decompressed, big_oops_buf, unzipped_len);
 
/* Append ECC notice to decompressed buffer. */
-   memcpy(decompressed + unzipped_len, record->buf + record->size,
+   memcpy(workspace + unzipped_len, record->buf + record->size,
   record->ecc_notice_size);
 
-   /* Swap out compresed contents with decompressed contents. */
+   /* Copy decompressed contents into an minimum-sized allocation. */
+   unzipped = kmemdup(workspace, unzipped_len + record->ecc_notice_size,
+  GFP_KERNEL);
+   kfree(workspace);
+   if (!unzipped)
+   return;
+
+   /* Swap out compressed contents with decompressed contents. */
kfree(record->buf);
-   record->buf = decompressed;
+   record->buf = unzipped;
record->size = unzipped_len;
record->compressed = false;
 }
-- 
2.17.1

[PATCH linux-next 1/8] pstore/ram: Standardize module name in ramoops

2018-11-01 Thread Kees Cook

With both ram.c and ram_core.c built into ramoops.ko, it doesn't make
sense to have differing pr_fmt prefixes. This fixes ram_core.c to use
the module name (as ram.c already does). Additionally improves region
reservation error to include the region name.

Signed-off-by: Kees Cook 
---
 fs/pstore/ram_core.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
index 23ca6f2c98a0..f5d0173901aa 100644
--- a/fs/pstore/ram_core.c
+++ b/fs/pstore/ram_core.c
@@ -12,7 +12,7 @@
  *
  */
 
-#define pr_fmt(fmt) "persistent_ram: " fmt
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include 
 #include 
@@ -443,7 +443,8 @@ static void *persistent_ram_iomap(phys_addr_t start, size_t 
size,
void *va;
 
if (!request_mem_region(start, size, label ?: "ramoops")) {
-   pr_err("request mem region (0x%llx@0x%llx) failed\n",
+   pr_err("request mem region (%s 0x%llx@0x%llx) failed\n",
+   label ?: "ramoops",
(unsigned long long)size, (unsigned long long)start);
return NULL;
}
-- 
2.17.1

[PATCH 6/8] pstore: Replace open-coded << with BIT()

2018-11-01 Thread Kees Cook

Minor clean-up to use BIT() (as already done in pstore_ram.h).

Signed-off-by: Kees Cook 
---
 include/linux/pstore.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/pstore.h b/include/linux/pstore.h
index 877ed81de346..3549f2ba865c 100644
--- a/include/linux/pstore.h
+++ b/include/linux/pstore.h
@@ -189,10 +189,10 @@ struct pstore_info {
 };
 
 /* Supported frontends */
-#define PSTORE_FLAGS_DMESG (1 << 0)
-#define PSTORE_FLAGS_CONSOLE   (1 << 1)
-#define PSTORE_FLAGS_FTRACE(1 << 2)
-#define PSTORE_FLAGS_PMSG  (1 << 3)
+#define PSTORE_FLAGS_DMESG BIT(0)
+#define PSTORE_FLAGS_CONSOLE   BIT(1)
+#define PSTORE_FLAGS_FTRACEBIT(2)
+#define PSTORE_FLAGS_PMSG  BIT(3)
 
 extern int pstore_register(struct pstore_info *);
 extern void pstore_unregister(struct pstore_info *);
-- 
2.17.1

[PATCH] genirq/affinity: Spread IRQs to all available NUMA nodes

2018-11-01 Thread Long Li

From: Long Li 

On systems with large number of NUMA nodes, there may be more NUMA nodes than
the number of MSI/MSI-X interrupts that device requests for. The current code
always picks up the NUMA nodes starting from the node 0, up to the number of
interrupts requested. This may left some later NUMA nodes unused.

For example, if the system has 16 NUMA nodes, and the device reqeusts for 8
interrupts, NUMA node 0 to 7 are assigned for those interrupts, NUMA 8 to 15
are unused.

There are several problems with this approach:
1. Later, when those managed IRQs are allocated, they can not be assigned to
NUMA 8 to 15, this may create an IRQ concentration on NUMA 0 to 7.
2. Some upper layers assume affinity mask has a complete coverage over NUMA 
nodes.
For example, block layer use the affinity mask to decide how to map CPU queues 
to
hardware queues, missing NUMA nodes in the masks may result in an uneven mapping
of queues. For the above example of 16 NUMA nodes, CPU queues on NUMA node 0 to 
7
are assigned to the hardware queues 0 to 7, respectively. But CPU queues on NUMA
node 8 to 15 are all assigned to the hardware queue 0.

Fix this problem by going over all NUMA nodes and assign them round-robin to
all IRQs.

Signed-off-by: Long Li 
---
 kernel/irq/affinity.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index f4f29b9d90ee..2d08b560d4b6 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -117,12 +117,13 @@ static int irq_build_affinity_masks(const struct 
irq_affinity *affd,
 */
if (numvecs <= nodes) {
for_each_node_mask(n, nodemsk) {
-   cpumask_copy(masks + curvec, node_to_cpumask[n]);
-   if (++done == numvecs)
-   break;
+   cpumask_or(masks + curvec, masks + curvec, 
node_to_cpumask[n]);
+   done++;
if (++curvec == last_affv)
curvec = affd->pre_vectors;
}
+   if (done > numvecs)
+   done = numvecs;
goto out;
}
 
-- 
2.14.1

[PATCH 7/8] pstore: Remove needless lock during console writes

2018-11-01 Thread Kees Cook

Since commit 70ad35db3321 ("pstore: Convert console write to use
->write_buf"), the console writer does not use the preallocated crash
dump buffer any more, so there is no reason to perform locking around it.

Signed-off-by: Kees Cook 
---
 fs/pstore/platform.c | 29 ++---
 1 file changed, 6 insertions(+), 23 deletions(-)

diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
index a956c7bc3f67..32340e7dd6a5 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -461,31 +461,14 @@ static void pstore_unregister_kmsg(void)
 #ifdef CONFIG_PSTORE_CONSOLE
 static void pstore_console_write(struct console *con, const char *s, unsigned 
c)
 {
-   const char *e = s + c;
+   struct pstore_record record;
 
-   while (s < e) {
-   struct pstore_record record;
-   unsigned long flags;
-
-   pstore_record_init(&record, psinfo);
-   record.type = PSTORE_TYPE_CONSOLE;
-
-   if (c > psinfo->bufsize)
-   c = psinfo->bufsize;
+   pstore_record_init(&record, psinfo);
+   record.type = PSTORE_TYPE_CONSOLE;
 
-   if (oops_in_progress) {
-   if (!spin_trylock_irqsave(&psinfo->buf_lock, flags))
-   break;
-   } else {
-   spin_lock_irqsave(&psinfo->buf_lock, flags);
-   }
-   record.buf = (char *)s;
-   record.size = c;
-   psinfo->write(&record);
-   spin_unlock_irqrestore(&psinfo->buf_lock, flags);
-   s += c;
-   c = e - s;
-   }
+   record.buf = (char *)s;
+   record.size = c;
+   psinfo->write(&record);
 }
 
 static struct console pstore_console = {
-- 
2.17.1

[PATCH 3/8] pstore/ram: Report backend assignments with finer granularity

2018-11-01 Thread Kees Cook

In order to more easily perform automated regression testing, this
adds pr_debug() calls to report each prz allocation which can then be
verified against persistent storage. Specifically, seeing the dividing
line between header, data, any ECC bytes. (And the general assignment
output is updated to remove the bogus ECC blocksize which isn't actually
recorded outside the prz instance.)

Signed-off-by: Kees Cook 
---
 fs/pstore/ram.c  | 4 ++--
 fs/pstore/ram_core.c | 6 ++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
index b51901f97dc2..25bede911809 100644
--- a/fs/pstore/ram.c
+++ b/fs/pstore/ram.c
@@ -856,9 +856,9 @@ static int ramoops_probe(struct platform_device *pdev)
ramoops_pmsg_size = pdata->pmsg_size;
ramoops_ftrace_size = pdata->ftrace_size;
 
-   pr_info("attached 0x%lx@0x%llx, ecc: %d/%d\n",
+   pr_info("using 0x%lx@0x%llx, ecc: %d\n",
cxt->size, (unsigned long long)cxt->phys_addr,
-   cxt->ecc_info.ecc_size, cxt->ecc_info.block_size);
+   cxt->ecc_info.ecc_size);
 
return 0;
 
diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
index f5d0173901aa..d5bf9be82545 100644
--- a/fs/pstore/ram_core.c
+++ b/fs/pstore/ram_core.c
@@ -576,6 +576,12 @@ struct persistent_ram_zone *persistent_ram_new(phys_addr_t 
start, size_t size,
if (ret)
goto err;
 
+   pr_debug("attached %s 0x%lx@0x%llx: %lu header, %lu data, %lu ecc 
(%d/%d)\n",
+   prz->label, prz->size, (unsigned long long)prz->paddr,
+   sizeof(*prz->buffer), prz->buffer_size,
+   prz->size - sizeof(*prz->buffer) - prz->buffer_size,
+   prz->ecc_info.ecc_size, prz->ecc_info.block_size);
+
return prz;
 err:
persistent_ram_free(prz);
-- 
2.17.1

[PATCH 4/8] pstore/ram: Add kern-doc for struct persistent_ram_zone

2018-11-01 Thread Kees Cook

The struct persistent_ram_zone wasn't well documented. This adds kern-doc
for it.

Signed-off-by: Kees Cook 
---
 fs/pstore/ram_core.c   | 10 +
 include/linux/pstore_ram.h | 46 +++---
 2 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
index d5bf9be82545..1cda5922b4b4 100644
--- a/fs/pstore/ram_core.c
+++ b/fs/pstore/ram_core.c
@@ -29,6 +29,16 @@
 #include 
 #include 
 
+/**
+ * struct persistent_ram_buffer - persistent circular RAM buffer
+ *
+ * @sig:
+ * signature to indicate header (PERSISTENT_RAM_SIG xor PRZ-type value)
+ * @start:
+ * offset into @data where the beginning of the stored bytes begin
+ * @size:
+ * number of valid bytes stored in @data
+ */
 struct persistent_ram_buffer {
uint32_tsig;
atomic_tstart;
diff --git a/include/linux/pstore_ram.h b/include/linux/pstore_ram.h
index 6e94980357d2..5d10ad51c1c4 100644
--- a/include/linux/pstore_ram.h
+++ b/include/linux/pstore_ram.h
@@ -30,6 +30,10 @@
  * PRZ_FLAG_NO_LOCK is used. For all other cases, locking is required.
  */
 #define PRZ_FLAG_NO_LOCK   BIT(0)
+/*
+ * If a PRZ should only have a single-boot lifetime, this marks it as
+ * getting wiped after its contents get copied out after boot.
+ */
 #define PRZ_FLAG_ZAP_OLD   BIT(1)
 
 struct persistent_ram_buffer;
@@ -43,17 +47,53 @@ struct persistent_ram_ecc_info {
uint16_t *par;
 };
 
+/**
+ * struct persistent_ram_zone - Details of a persistent RAM zone (PRZ)
+ *  used as a pstore backend
+ *
+ * @paddr: physical address of the mapped RAM area
+ * @size:  size of mapping
+ * @label: unique name of this PRZ
+ * @flags: holds PRZ_FLAGS_* bits
+ *
+ * @buffer_lock:
+ * locks access to @buffer "size" bytes and "start" offset
+ * @buffer:
+ * pointer to actual RAM area managed by this PRZ
+ * @buffer_size:
+ * bytes in @buffer->data (not including any trailing ECC bytes)
+ *
+ * @par_buffer:
+ * pointer into @buffer->data containing ECC bytes for @buffer->data
+ * @par_header:
+ * pointer into @buffer->data containing ECC bytes for @buffer header
+ * (i.e. all fields up to @data)
+ * @rs_decoder:
+ * RSLIB instance for doing ECC calculations
+ * @corrected_bytes:
+ * ECC corrected bytes accounting since boot
+ * @bad_blocks:
+ * ECC uncorrectable bytes accounting since boot
+ * @ecc_info:
+ * ECC configuration details
+ *
+ * @old_log:
+ * saved copy of @buffer->data prior to most recent wipe
+ * @old_log_size:
+ * bytes contained in @old_log
+ *
+ */
 struct persistent_ram_zone {
phys_addr_t paddr;
size_t size;
void *vaddr;
char *label;
-   struct persistent_ram_buffer *buffer;
-   size_t buffer_size;
u32 flags;
+
raw_spinlock_t buffer_lock;
+   struct persistent_ram_buffer *buffer;
+   size_t buffer_size;
 
-   /* ECC correction */
char *par_buffer;
char *par_header;
struct rs_control *rs_decoder;
-- 
2.17.1

[PATCH 0/8] pstore improvements (pstore-next)

2018-11-01 Thread Kees Cook

This is a posting of several patches I've been working on to improve
pstore. Most of it is better comments, output, and naming, but one
bug fix stands out to fix head-truncationg of compressed records.
Details in the individual patches. Review appreciated! :)

-Kees

Kees Cook (8):
  pstore/ram: Standardize module name in ramoops
  pstore: Do not use crash buffer for decompression
  pstore/ram: Report backend assignments with finer granularity
  pstore/ram: Add kern-doc for struct persistent_ram_zone
  pstore: Improve and update some comments and status output
  pstore: Replace open-coded << with BIT()
  pstore: Remove needless lock during console writes
  pstore/ram: Correctly calculate usable PRZ bytes

 fs/pstore/platform.c   | 92 ++
 fs/pstore/ram.c| 19 
 fs/pstore/ram_core.c   | 25 +--
 include/linux/pstore.h | 15 ---
 include/linux/pstore_ram.h | 46 +--
 5 files changed, 116 insertions(+), 81 deletions(-)

-- 
2.17.1

[PATCH 8/8] pstore/ram: Correctly calculate usable PRZ bytes

2018-11-01 Thread Kees Cook

The actual number of bytes stored in a PRZ is smaller than the
bytes requested by platform data, since there is a header on each
PRZ. Additionally, if ECC is enabled, there are trailing bytes used
as well. Normally this mismatch doesn't matter since PRZs are circular
buffers and the leading "overflow" bytes are just thrown away. However, in
the case of a compressed record, this rather badly corrupts the results.

This corruption was visible with "ramoops.mem_size=204800 ramoops.ecc=1".
Any stored crashes would not be uncompressable (producing a pstorefs
"dmesg-*.enc.z" file), and triggering errors at boot:

  [2.790759] pstore: crypto_comp_decompress failed, ret = -22!

Reported-by: Joel Fernandes 
Fixes: b0aad7a99c1d ("pstore: Add compression support to pstore")
Signed-off-by: Kees Cook 
---
 fs/pstore/ram.c| 15 ++-
 include/linux/pstore.h |  5 -
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
index 25bede911809..10ac4d23c423 100644
--- a/fs/pstore/ram.c
+++ b/fs/pstore/ram.c
@@ -814,17 +814,14 @@ static int ramoops_probe(struct platform_device *pdev)
 
cxt->pstore.data = cxt;
/*
-* Console can handle any buffer size, so prefer LOG_LINE_MAX. If we
-* have to handle dumps, we must have at least record_size buffer. And
-* for ftrace, bufsize is irrelevant (if bufsize is 0, buf will be
-* ZERO_SIZE_PTR).
+* Since bufsize is only used for dmesg crash dumps, it
+* must match the size of the dprz record (after PRZ header
+* and ECC bytes have been accounted for).
 */
-   if (cxt->console_size)
-   cxt->pstore.bufsize = 1024; /* LOG_LINE_MAX */
-   cxt->pstore.bufsize = max(cxt->record_size, cxt->pstore.bufsize);
-   cxt->pstore.buf = kmalloc(cxt->pstore.bufsize, GFP_KERNEL);
+   cxt->pstore.bufsize = cxt->dprzs[0]->buffer_size;
+   cxt->pstore.buf = kzalloc(cxt->pstore.bufsize, GFP_KERNEL);
if (!cxt->pstore.buf) {
-   pr_err("cannot allocate pstore buffer\n");
+   pr_err("cannot allocate pstore crash dump buffer\n");
err = -ENOMEM;
goto fail_clear;
}
diff --git a/include/linux/pstore.h b/include/linux/pstore.h
index 3549f2ba865c..f46e5df76b58 100644
--- a/include/linux/pstore.h
+++ b/include/linux/pstore.h
@@ -90,7 +90,10 @@ struct pstore_record {
  *
  * @buf_lock:  spinlock to serialize access to @buf
  * @buf:   preallocated crash dump buffer
- * @bufsize:   size of @buf available for crash dump writes
+ * @bufsize:   size of @buf available for crash dump bytes (must match
+ * smallest number of bytes available for writing to a
+ * backend entry, since compressed bytes don't take kindly
+ * to being truncated)
  *
  * @read_mutex:serializes @open, @read, @close, and @erase callbacks
  * @flags: bitfield of frontends the backend can accept writes for
-- 
2.17.1

[PATCH 5/8] pstore: Improve and update some comments and status output

2018-11-01 Thread Kees Cook

This improves and updates some comments:
 - dump handler comment out of sync from calling convention
 - fix kern-doc typo

and improves status output:
 - reminder that only kernel crash dumps are compressed
 - do not be silent about ECC infrastructure failures

Signed-off-by: Kees Cook 
---
 fs/pstore/platform.c   | 7 +++
 fs/pstore/ram_core.c   | 4 +++-
 include/linux/pstore.h | 2 +-
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
index 8b6028948cf3..a956c7bc3f67 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -304,7 +304,7 @@ static void allocate_buf_for_compression(void)
big_oops_buf_sz = size;
big_oops_buf = buf;
 
-   pr_info("Using compression: %s\n", zbackend->name);
+   pr_info("Using crash dump compression: %s\n", zbackend->name);
 }
 
 static void free_buf_for_compression(void)
@@ -354,9 +354,8 @@ void pstore_record_init(struct pstore_record *record,
 }
 
 /*
- * callback from kmsg_dump. (s2,l2) has the most recently
- * written bytes, older bytes are in (s1,l1). Save as much
- * as we can from the end of the buffer.
+ * callback from kmsg_dump. Save as much as we can (up to kmsg_bytes) from the
+ * end of the buffer.
  */
 static void pstore_dump(struct kmsg_dumper *dumper,
enum kmsg_dump_reason reason)
diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
index 1cda5922b4b4..e859e02f67a8 100644
--- a/fs/pstore/ram_core.c
+++ b/fs/pstore/ram_core.c
@@ -503,8 +503,10 @@ static int persistent_ram_post_init(struct 
persistent_ram_zone *prz, u32 sig,
bool zap = !!(prz->flags & PRZ_FLAG_ZAP_OLD);
 
ret = persistent_ram_init_ecc(prz, ecc_info);
-   if (ret)
+   if (ret) {
+   pr_warn("ECC failed %s\n", prz->label);
return ret;
+   }
 
sig ^= PERSISTENT_RAM_SIG;
 
diff --git a/include/linux/pstore.h b/include/linux/pstore.h
index a15bc4d48752..877ed81de346 100644
--- a/include/linux/pstore.h
+++ b/include/linux/pstore.h
@@ -85,7 +85,7 @@ struct pstore_record {
 /**
  * struct pstore_info - backend pstore driver structure
  *
- * @owner: module which is repsonsible for this backend driver
+ * @owner: module which is responsible for this backend driver
  * @name:  name of the backend driver
  *
  * @buf_lock:  spinlock to serialize access to @buf
-- 
2.17.1

[git pull] work.afs

2018-11-01 Thread Al Viro

AFS series, with some iov_iter bits included.  Backmerge of NFS client
branch is due to conflict between sunrpc changes in there and 
iov_iter_{k,b}vec()
calling conventions change in iov_iter part; if you prefer to do that yourself,
just merge work.afs^^ and cherry-pick work.afs HEAD into it (or do the fixup
yourself - it's really trivial).  One trivial conflict (also in sunrpc, with 
nfsd
this time) due to the same commit; merge candidate is in #proposed-merge.  IMO
that one doesn't deserve a backmerge - it does trigger textual conflict, unlike
the NFS client one.

The following changes since commit 331bc71cb1751d78f6807ad8e6162b07c67cdd1b:

  SUNRPC: Convert the auth cred cache to use refcount_t (2018-10-23 12:24:33 
-0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git work.afs

for you to fetch changes up to 0e9b4a82710220c04100892fb7277b78fd33a747:

  missing bits of "iov_iter: Separate type from direction and use accessor 
functions" (2018-11-01 18:19:03 -0400)


Al Viro (2):
  Merge tag 'nfs-for-4.20-1' of 
git://git.linux-nfs.org/projects/trondmy/linux-nfs
  missing bits of "iov_iter: Separate type from direction and use accessor 
functions"

David Howells (25):
  amd-gpu: Don't undefine READ and WRITE
  iov_iter: Use accessor function
  iov_iter: Separate type from direction and use accessor functions
  iov_iter: Add I/O discard iterator
  afs: Better tracing of protocol errors
  afs: Set up the iov_iter before calling afs_extract_data()
  afs: Improve FS server rotation error handling
  afs: Implement VL server rotation
  afs: Fix TTL on VL server and address lists
  afs: Handle EIO from delivery function
  afs: Add a couple of tracepoints to log I/O errors
  afs: Don't invoke the server to read data beyond EOF
  afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS
  afs: Commit the status on a new file/dir/symlink
  afs: Remove callback details from afs_callback_break struct
  afs: Implement the YFS cache manager service
  afs: Fix FS.FetchStatus delivery from updating wrong vnode
  afs: Calc callback expiry in op reply delivery
  afs: Get the target vnode in afs_rmdir() and get a callback on it
  afs: Expand data structure fields to support YFS
  afs: Implement YFS support in the fs client
  afs: Allow dumping of server cursor on operation failure
  afs: Eliminate the address pointer from the address list cursor
  afs: Fix callback handling
  afs: Probe multiple fileservers simultaneously

 block/bio.c   |2 +-
 drivers/block/drbd/drbd_main.c|2 +-
 drivers/block/drbd/drbd_receiver.c|2 +-
 drivers/block/loop.c  |9 +-
 drivers/block/nbd.c   |   12 +-
 drivers/fsi/fsi-sbefifo.c |4 +-
 drivers/gpu/drm/amd/display/dc/os_types.h |2 -
 drivers/isdn/mISDN/l1oip_core.c   |3 +-
 drivers/misc/vmw_vmci/vmci_queue_pair.c   |6 +-
 drivers/nvme/target/io-cmd-file.c |2 +-
 drivers/target/iscsi/iscsi_target_util.c  |6 +-
 drivers/target/target_core_file.c |6 +-
 drivers/usb/usbip/usbip_common.c  |2 +-
 drivers/xen/pvcalls-back.c|8 +-
 fs/9p/vfs_addr.c  |4 +-
 fs/9p/vfs_dir.c   |2 +-
 fs/9p/xattr.c |4 +-
 fs/afs/Kconfig|   12 +
 fs/afs/Makefile   |7 +-
 fs/afs/addr_list.c|  209 +--
 fs/afs/afs.h  |   50 +-
 fs/afs/cache.c|2 +-
 fs/afs/callback.c |   17 +-
 fs/afs/cell.c |   65 +-
 fs/afs/cmservice.c|  287 +++-
 fs/afs/dir.c  |   75 +-
 fs/afs/dynroot.c  |4 +-
 fs/afs/file.c |8 +-
 fs/afs/flock.c|   22 +-
 fs/afs/fs_probe.c |  270 
 fs/afs/fsclient.c |  583 
 fs/afs/inode.c|   37 +-
 fs/afs/internal.h |  322 -
 fs/afs/mntpt.c|5 +-
 fs/afs/proc.c |  110 +-
 fs/afs/protocol_yfs.h |  163 +++
 fs/afs/rotate.c   |  302 ++--
 fs/afs/rxrpc.c|  115 +-
 fs/afs/security.c |   13 +-
 fs/afs/server.c   |  145 +-
 fs/afs/server_list.c  |6 +-
 fs/afs/super.c|5 +-
 fs/afs/vl_list.c  |  340 +
 fs/afs/vl_

[PATCH resend] fs/posix_acl: fix kernel-doc warnings and typo

2018-11-01 Thread Randy Dunlap

From: Randy Dunlap 

Fix kernel-doc warnings in fs/posic_acl.c.
Also fix one typo (setgit -> setgid).

../fs/posix_acl.c:646: warning: Function parameter or member 'inode' not 
described in 'posix_acl_update_mode'
../fs/posix_acl.c:646: warning: Function parameter or member 'mode_p' not 
described in 'posix_acl_update_mode'
../fs/posix_acl.c:646: warning: Function parameter or member 'acl' not 
described in 'posix_acl_update_mode'

Fixes: 073931017b49d ("posix_acl: Clear SGID bit when setting file permissions")

Signed-off-by: Randy Dunlap 
Cc: Jan Kara 
Cc: Andreas Gruenbacher 
Cc: Alexander Viro 
Cc: linux-fsde...@vger.kernel.org
Acked-by: Andreas Gruenbacher 
Reviewed-by: Jan Kara 
---
v2: change *acl to *@acl

 fs/posix_acl.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

--- linux-next-20181101.orig/fs/posix_acl.c
+++ linux-next-20181101/fs/posix_acl.c
@@ -630,12 +630,15 @@ EXPORT_SYMBOL_GPL(posix_acl_create);
 
 /**
  * posix_acl_update_mode  -  update mode in set_acl
+ * @inode: target inode
+ * @mode_p: mode (pointer) for update
+ * @acl: acl pointer
  *
  * Update the file mode when setting an ACL: compute the new file permission
  * bits based on the ACL.  In addition, if the ACL is equivalent to the new
- * file mode, set *acl to NULL to indicate that no ACL should be set.
+ * file mode, set *@acl to NULL to indicate that no ACL should be set.
  *
- * As with chmod, clear the setgit bit if the caller is not in the owning group
+ * As with chmod, clear the setgid bit if the caller is not in the owning group
  * or capable of CAP_FSETID (see inode_change_ok).
  *
  * Called from set_acl inode operations.

[PATCH resend] scripts/faddr2line: fix location of start_kernel in comment

2018-11-01 Thread Randy Dunlap

From: Randy Dunlap 

Fix a source file reference location to the correct path name.

Signed-off-by: Randy Dunlap 
Cc: Josh Poimboeuf 
---
 scripts/faddr2line |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-next-20181101.orig/scripts/faddr2line
+++ linux-next-20181101/scripts/faddr2line
@@ -71,7 +71,7 @@ die() {
 
 # Try to figure out the source directory prefix so we can remove it from the
 # addr2line output.  HACK ALERT: This assumes that start_kernel() is in
-# kernel/init.c!  This only works for vmlinux.  Otherwise it falls back to
+# init/main.c!  This only works for vmlinux.  Otherwise it falls back to
 # printing the absolute path.
 find_dir_prefix() {
local objfile=$1

[PATCH resend] w1: add missing kernel-doc entry for of_match_table

2018-11-01 Thread Randy Dunlap

From: Randy Dunlap 

Fix kernel-doc warning for missing struct member description:

../include/linux/w1.h:281: warning: Function parameter or member 
'of_match_table' not described in 'w1_family'

Signed-off-by: Randy Dunlap 
Cc: Evgeniy Polyakov 
---
 include/linux/w1.h |1 +
 1 file changed, 1 insertion(+)

--- linux-next-20181101.orig/include/linux/w1.h
+++ linux-next-20181101/include/linux/w1.h
@@ -266,6 +266,7 @@ struct w1_family_ops {
  * @family_entry:  family linked list
  * @fid:   8 bit family identifier
  * @fops:  operations for this family
+ * @of_match_table:Open Firmware device matching table
  * @refcnt:reference counter
  */
 struct w1_family {

[PATCH resend] arch/sh: mach-kfr2r09: fix struct mtd_oob_ops build warning

2018-11-01 Thread Randy Dunlap

From: Randy Dunlap 

arch/sh/boards/mach-kfr2r09/setup.c does not need to #include
, and doing so causes a build warning, so drop
that header file.

In file included from ../arch/sh/boards/mach-kfr2r09/setup.c:28:
../include/linux/mtd/onenand.h:225:12: warning: 'struct mtd_oob_ops' declared 
inside parameter list will not be visible outside of this definition or 
declaration
 struct mtd_oob_ops *ops);

Fixes: f3590dc32974 ("media: arch: sh: kfr2r09: Use new renesas-ceu camera 
driver")

Reported-by: Geert Uytterhoeven 
Suggested-by: Miquel Raynal 
Signed-off-by: Randy Dunlap 
Reviewed-by: Miquel Raynal 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Jacopo Mondi 
Cc: Magnus Damm 
Cc: linux-...@lists.infradead.org
Cc: linux...@vger.kernel.org
---
 arch/sh/boards/mach-kfr2r09/setup.c |1 -
 1 file changed, 1 deletion(-)

--- linux-next-20181101.orig/arch/sh/boards/mach-kfr2r09/setup.c
+++ linux-next-20181101/arch/sh/boards/mach-kfr2r09/setup.c
@@ -25,7 +25,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include

Re: [PATCH 0/5] Implement devm_of_clk_add_provider

2018-11-01 Thread Stephen Boyd

Quoting Ricardo Ribalda Delgado (2018-11-01 07:40:39)
> All Tull reported that there might be a great ammount of drivers with
> imbalance on clk_add_provider. This is an issue for Device tree overlays
> (and also a bug) https://lkml.org/lkml/2018/10/18/1103
> 
> This patchset implement a devm_ function of of_clk_add_provider, and
> fixes 3 drivers.
> 
> Drivers like clk-gpio will be easily fixed with coccinelle if this set
> is accepted. (I volunteer, I want to learn how to use it, just seen the
> great presentations from Julia).

We already have devm_of_clk_add_hw_provider(), so any instances of
of_clk_add_provider() should be replaced with that, instead of
propagating the usage of of_clk_add_provider() any further. I'll gladly
apply patches to convert drivers from struct clk based APIs to struct
clk_hw based APIs so that we can clearly split clk providers from clk
consumers. So if you're interested in working on some coccinelle script
for that it would be great!

Re: [PATCH] nvme: create 'paths' entries for hidden controllers

2018-11-01 Thread Thadeu Lima de Souza Cascardo

On Fri, Oct 05, 2018 at 09:32:45AM +0200, Christoph Hellwig wrote:
> On Fri, Sep 28, 2018 at 04:17:20PM -0300, Thadeu Lima de Souza Cascardo wrote:
> > When using initramfs-tools with only the necessary dependencies to mount
> > the root filesystem, it will fail to include nvme drivers for a root on a
> > multipath nvme. That happens because the slaves relationship is not
> > present.
> > 
> > As discussed in [1], using slaves will break lsblk, because the slaves are
> > hidden from userspace, that is, they have no real block device, just an
> > entry under sysfs.
> > 
> > Introducing the paths subdir and using that on initramfs-tools makes it
> > possible to now boot a system with nvme multipath as root.
> 
> Do we need documentation how these paths links are supposed to work?
> Who is going to parse them?

Hi, Christoph.

I have just sent a v2 against block/for-next with a Documentation file
describing it. The first intended user is initramfs-tools, documented there as
well.

Thanks.
Cascardo.

[PATCH v2] nvme: create 'paths' entries for hidden controllers

2018-11-01 Thread Thadeu Lima de Souza Cascardo

When using initramfs-tools with only the necessary dependencies to mount
the root filesystem, it will fail to include nvme drivers for a root on a
multipath nvme. That happens because the slaves relationship is not
present.

As discussed in [1], using slaves will break lsblk, because the slaves are
hidden from userspace, that is, they have no real block device, just an
entry under sysfs.

Introducing the paths subdir and using that on initramfs-tools makes it
possible to now boot a system with nvme multipath as root.

[1] https://www.spinics.net/lists/stable/msg222779.html

Cc: Christoph Hellwig 
Cc: Potnuri Bharat Teja 
Cc: Keith Busch 
Cc: Hannes Reinecke 
Cc: Martin K. Petersen 
Signed-off-by: Thadeu Lima de Souza Cascardo 
---
 Documentation/ABI/testing/sysfs-block-nvme | 10 
 drivers/nvme/host/core.c   |  2 ++
 drivers/nvme/host/multipath.c  | 29 --
 drivers/nvme/host/nvme.h   |  9 +++
 4 files changed, 48 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-block-nvme

diff --git a/Documentation/ABI/testing/sysfs-block-nvme 
b/Documentation/ABI/testing/sysfs-block-nvme
new file mode 100644
index ..3fe51b7be1e1
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-block-nvme
@@ -0,0 +1,10 @@
+What:  /sys/block/nvme*/paths
+Date:  Oct, 2019
+KernelVersion: v4.21
+Contact:   Thadeu Lima de Souza Cascardo 
+Description:
+   This is a directory containing symlinks to other block
+   devices, when the block device is a nvme multipath
+   device.
+Users: initramfs-tools
+
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9e4a30b05bd2..06be47e878f5 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3115,6 +3115,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, 
unsigned nsid)
device_add_disk(ctrl->device, ns->disk, nvme_ns_id_attr_groups);
 
nvme_mpath_add_disk(ns, id);
+   nvme_mpath_add_disk_links(ns);
nvme_fault_inject_init(ns);
kfree(id);
 
@@ -3138,6 +3139,7 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 
nvme_fault_inject_fini(ns);
if (ns->disk && ns->disk->flags & GENHD_FL_UP) {
+   nvme_mpath_remove_disk_links(ns);
del_gendisk(ns->disk);
blk_cleanup_queue(ns->queue);
if (blk_get_integrity(ns->disk))
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 5e3cc8c59a39..65dabe7d6d7c 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -317,9 +317,12 @@ static void nvme_mpath_set_live(struct nvme_ns *ns)
if (!head->disk)
return;
 
-   if (!(head->disk->flags & GENHD_FL_UP))
+   if (!(head->disk->flags & GENHD_FL_UP)) {
+   struct kobject *hd_kobj = &disk_to_dev(head->disk)->kobj;
device_add_disk(&head->subsys->dev, head->disk,
nvme_ns_id_attr_groups);
+   head->path_dir = kobject_create_and_add("paths", hd_kobj);
+   }
 
if (nvme_path_is_optimized(ns)) {
int node, srcu_idx;
@@ -530,6 +533,19 @@ void nvme_mpath_add_disk(struct nvme_ns *ns, struct 
nvme_id_ns *id)
}
 }
 
+void nvme_mpath_add_disk_links(struct nvme_ns *ns)
+{
+   struct kobject *path_disk_kobj;
+
+   if (!ns->head->disk)
+   return;
+
+   path_disk_kobj = &disk_to_dev(ns->disk)->kobj;
+   if (sysfs_create_link(ns->head->path_dir, path_disk_kobj,
+   kobject_name(path_disk_kobj)))
+   return;
+}
+
 void nvme_mpath_remove_disk(struct nvme_ns_head *head)
 {
if (!head->disk)
@@ -541,9 +557,19 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
kblockd_schedule_work(&head->requeue_work);
flush_work(&head->requeue_work);
blk_cleanup_queue(head->disk->queue);
+   kobject_put(head->path_dir);
put_disk(head->disk);
 }
 
+void nvme_mpath_remove_disk_links(struct nvme_ns *ns)
+{
+   if (!ns->head->disk)
+   return;
+
+   sysfs_remove_link(ns->head->path_dir,
+   kobject_name(&disk_to_dev(ns->disk)->kobj));
+}
+
 int nvme_mpath_init(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
 {
int error;
@@ -593,4 +619,3 @@ void nvme_mpath_uninit(struct nvme_ctrl *ctrl)
 {
kfree(ctrl->ana_log_buf);
 }
-
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 9fefba039d1e..6093649d4696 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -287,6 +287,7 @@ struct nvme_ns_head {
int instance;
 #ifdef CONFIG_NVME_MULTIPATH
struct gendisk  *disk;
+   struct kobject  *path_dir;
struct bio_list requeue_list;
spinlock_t  requeue_lock;
struct work_st

EXP rcu: Revert expedited GP parallelization cleverness

2018-11-01 Thread Paul E. McKenney

> (Commit 258ba8e089db23f760139266c232f01bad73f85c from linux-rcu)
> 
> This commit reverts a series of commits starting with fcc635436501 ("rcu:
> Make expedited GPs handle CPU 0 being offline") and its successors, thus
> queueing each rcu_node structure's expedited grace-period initialization
> work on the first CPU of that rcu_node structure.
> 
> Suggested-by: Sebastian Andrzej Siewior 
> Signed-off-by: Paul E. McKenney 
> Signed-off-by: Sebastian Andrzej Siewior 
> 
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 0b2c2ad69629..a0486414edb4 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -472,7 +472,6 @@ static void sync_rcu_exp_select_node_cpus(struct 
> work_struct *wp)
>  static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
>smp_call_func_t func)
>  {
> - int cpu;
>   struct rcu_node *rnp;
>  
>   trace_rcu_exp_grace_period(rsp->name, rcu_exp_gp_seq_endval(rsp), 
> TPS("reset"));
> @@ -494,13 +493,7 @@ static void sync_rcu_exp_select_cpus(struct rcu_state 
> *rsp,
>   continue;
>   }
>   INIT_WORK(&rnp->rew.rew_work, sync_rcu_exp_select_node_cpus);
> - preempt_disable();
> - cpu = cpumask_next(rnp->grplo - 1, cpu_online_mask);
> - /* If all offline, queue the work on an unbound CPU. */
> - if (unlikely(cpu > rnp->grphi))
> - cpu = WORK_CPU_UNBOUND;
> - queue_work_on(cpu, rcu_par_gp_wq, &rnp->rew.rew_work);
> - preempt_enable();
> + queue_work_on(rnp->grplo, rcu_par_gp_wq, &rnp->rew.rew_work);
>   rnp->exp_need_flush = true;
>   }

How about instead changing the earlier "if" statement to read as follows?

if (!READ_ONCE(rcu_par_gp_wq) ||
rcu_scheduler_active != RCU_SCHEDULER_RUNNING ||
rcu_is_last_leaf_node(rnp) ||
IS_ENABLED(CONFIG_PREEMPT_RT_FULL)) {
/* No workqueues yet or last leaf, do direct call. */
sync_rcu_exp_select_node_cpus(&rnp->rew.rew_work);
continue;
}

This just adds the "|| IS_ENABLED(CONFIG_PREEMPT_RT_FULL)" to the "if"
condition.

The advantage of this approach is that it leaves the parallelization
alone for mainline, and avoids the overhead of the workqueues for -rt.

Thanx, Paul

Re: [PATCH v4] mm/page_owner: clamp read count to PAGE_SIZE

2018-11-01 Thread Joe Perches

On Thu, 2018-11-01 at 14:47 -0700, Andrew Morton wrote:
> On Fri, 2 Nov 2018 01:00:07 +0800  wrote:
> 
> > From: Miles Chen 
> > 
> > The page owner read might allocate a large size of memory with
> > a large read count. Allocation fails can easily occur when doing
> > high order allocations.
> > 
> > Clamp buffer size to PAGE_SIZE to avoid arbitrary size allocation
> > and avoid allocation fails due to high order allocation.
> > 
> > ...
> > 
> > --- a/mm/page_owner.c
> > +++ b/mm/page_owner.c
> > @@ -351,6 +351,7 @@ print_page_owner(char __user *buf, size_t count, 
> > unsigned long pfn,
> > .skip = 0
> > };
> >  
> > +   count = count > PAGE_SIZE ? PAGE_SIZE : count;
> > kbuf = kmalloc(count, GFP_KERNEL);
> > if (!kbuf)
> > return -ENOMEM;
> 
> A bit tidier:
> 
> --- a/mm/page_owner.c~mm-page_owner-clamp-read-count-to-page_size-fix
> +++ a/mm/page_owner.c
> @@ -351,7 +351,7 @@ print_page_owner(char __user *buf, size_
>   .skip = 0
>   };
>  
> - count = count > PAGE_SIZE ? PAGE_SIZE : count;
> + count = min_t(size_t, count, PAGE_SIZE);
>   kbuf = kmalloc(count, GFP_KERNEL);
>   if (!kbuf)
>   return -ENOMEM;

A bit tidier still might be

if (count > PAGE_SIZE)
count = PAGE_SIZE;

as that would not always cause a write back to count.

rcu: make RCU_BOOST default on RT

2018-11-01 Thread Paul E. McKenney

> Since it is no longer invoked from the softirq people run into OOM more
> often if the priority of the RCU thread is too low. Making boosting
> default on RT should help in those case and it can be switched off if
> someone knows better.
> 
> Signed-off-by: Sebastian Andrzej Siewior 
> 
> diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
> index 644264be90f0..0be2c96fb640 100644
> --- a/kernel/rcu/Kconfig
> +++ b/kernel/rcu/Kconfig
> @@ -36,7 +36,7 @@ config TINY_RCU
>  
>  config RCU_EXPERT
>   bool "Make expert-level adjustments to RCU configuration"
> - default n
> + default y if PREEMPT_RT_FULL

Would it work to leave this as is, and ...

>   help
> This option needs to be enabled if you wish to make
> expert-level adjustments to RCU configuration.  By default,
> @@ -191,7 +191,7 @@ config RCU_FAST_NO_HZ
>  config RCU_BOOST
>   bool "Enable RCU priority boosting"
>   depends on RT_MUTEXES && PREEMPT_RCU && RCU_EXPERT

... make the above line instead be:

depends on (RT_MUTEXES && PREEMPT_RCU && RCU_EXPERT) || PREEMPT_RT_FULL

Or am I missing something?

I agree that the risk might currently seem small, but if Linus ever
starts building PREEMPT_RT_FULL kernels, I really really do not want
RCU_EXPERT to be set.  ;-)

Thanx, Paul

> - default n
> + default y if PREEMPT_RT_FULL
>   help
> This option boosts the priority of preempted RCU readers that
> block the current preemptible RCU grace period for too long.

Re: RFC: userspace exception fixups

2018-11-01 Thread Andy Lutomirski

On Thu, Nov 1, 2018 at 2:24 PM Linus Torvalds
 wrote:
>
> On Thu, Nov 1, 2018 at 12:31 PM Rich Felker  wrote:
> >
> > See my other emails in this thread. You would register the *address*
> > (in TLS) of a function pointer object pointing to the handler, rather
> > than the function address of the handler. Then switching handler is
> > just a single store in userspace, no syscalls involved.
>
> Yes.
>
> And for just EENTER, maybe that's the right model.
>
> If we want to generalize it to other thread-synchronous faults, it
> needs way more information and a list of handlers, but if we limit the
> thing to _only_ EENTER getting an SGX fault, then a single "this is
> the fault handler" address is probably the right thing to do.

It sounds like you're saying that the kernel should know, *before*
running any user fixup code, whether the fault in question is one that
wants a fixup.  Sounds reasonable.

I think it would be nice, but not absolutely necessary, if user code
didn't need to poke some value into TLS each time it ran a function
that had a fixup.  With the poke-into-TLS approach, it looks a lot
like rseq, and rseq doesn't nest very nicely.  I think we really want
this mechanism to Just Work.  So we could maybe have a syscall that
associates a list of fixups with a given range of text addresses.  We
might want the kernel to automatically zap the fixups when the text in
question is unmapped.

rcu: Frob softirq test

2018-11-01 Thread Paul E. McKenney

> With RT_FULL we get the below wreckage:

The code that this applies to has itself been fully frobbed as of the
current merge window.  I believe that it should now work in -rt as is,
but who knows?  ;-)

Thanx, Paul

> [  126.060484] ===
> [  126.060486] [ INFO: possible circular locking dependency detected ]
> [  126.060489] 3.0.1-rt10+ #30
> [  126.060490] ---
> [  126.060492] irq/24-eth0/1235 is trying to acquire lock:
> [  126.060495]  (&(lock)->wait_lock#2){+.+...}, at: [] 
> rt_mutex_slowunlock+0x16/0x55
> [  126.060503]
> [  126.060504] but task is already holding lock:
> [  126.060506]  (&p->pi_lock){-...-.}, at: [] 
> try_to_wake_up+0x35/0x429
> [  126.060511]
> [  126.060511] which lock already depends on the new lock.
> [  126.060513]
> [  126.060514]
> [  126.060514] the existing dependency chain (in reverse order) is:
> [  126.060516]
> [  126.060516] -> #1 (&p->pi_lock){-...-.}:
> [  126.060519][] lock_acquire+0x145/0x18a
> [  126.060524][] _raw_spin_lock_irqsave+0x4b/0x85
> [  126.060527][] task_blocks_on_rt_mutex+0x36/0x20f
> [  126.060531][] rt_mutex_slowlock+0xd1/0x15a
> [  126.060534][] rt_mutex_lock+0x2d/0x2f
> [  126.060537][] rcu_boost+0xad/0xde
> [  126.060541][] rcu_boost_kthread+0x7d/0x9b
> [  126.060544][] kthread+0x99/0xa1
> [  126.060547][] kernel_thread_helper+0x4/0x10
> [  126.060551]
> [  126.060552] -> #0 (&(lock)->wait_lock#2){+.+...}:
> [  126.060555][] __lock_acquire+0x1157/0x1816
> [  126.060558][] lock_acquire+0x145/0x18a
> [  126.060561][] _raw_spin_lock+0x40/0x73
> [  126.060564][] rt_mutex_slowunlock+0x16/0x55
> [  126.060566][] rt_mutex_unlock+0x27/0x29
> [  126.060569][] rcu_read_unlock_special+0x17e/0x1c4
> [  126.060573][] __rcu_read_unlock+0x48/0x89
> [  126.060576][] select_task_rq_rt+0xc7/0xd5
> [  126.060580][] try_to_wake_up+0x175/0x429
> [  126.060583][] wake_up_process+0x15/0x17
> [  126.060585][] wakeup_softirqd+0x24/0x26
> [  126.060590][] irq_exit+0x49/0x55
> [  126.060593][] smp_apic_timer_interrupt+0x8a/0x98
> [  126.060597][] apic_timer_interrupt+0x13/0x20
> [  126.060600][] irq_forced_thread_fn+0x1b/0x44
> [  126.060603][] irq_thread+0xde/0x1af
> [  126.060606][] kthread+0x99/0xa1
> [  126.060608][] kernel_thread_helper+0x4/0x10
> [  126.060611]
> [  126.060612] other info that might help us debug this:
> [  126.060614]
> [  126.060615]  Possible unsafe locking scenario:
> [  126.060616]
> [  126.060617]CPU0CPU1
> [  126.060619]
> [  126.060620]   lock(&p->pi_lock);
> [  126.060623]lock(&(lock)->wait_lock);
> [  126.060625]lock(&p->pi_lock);
> [  126.060627]   lock(&(lock)->wait_lock);
> [  126.060629]
> [  126.060629]  *** DEADLOCK ***
> [  126.060630]
> [  126.060632] 1 lock held by irq/24-eth0/1235:
> [  126.060633]  #0:  (&p->pi_lock){-...-.}, at: [] 
> try_to_wake_up+0x35/0x429
> [  126.060638]
> [  126.060638] stack backtrace:
> [  126.060641] Pid: 1235, comm: irq/24-eth0 Not tainted 3.0.1-rt10+ #30
> [  126.060643] Call Trace:
> [  126.060644][] print_circular_bug+0x289/0x29a
> [  126.060651]  [] __lock_acquire+0x1157/0x1816
> [  126.060655]  [] ? trace_hardirqs_off_caller+0x1f/0x99
> [  126.060658]  [] ? rt_mutex_slowunlock+0x16/0x55
> [  126.060661]  [] lock_acquire+0x145/0x18a
> [  126.060664]  [] ? rt_mutex_slowunlock+0x16/0x55
> [  126.060668]  [] _raw_spin_lock+0x40/0x73
> [  126.060671]  [] ? rt_mutex_slowunlock+0x16/0x55
> [  126.060674]  [] ? rcu_report_qs_rsp+0x87/0x8c
> [  126.060677]  [] rt_mutex_slowunlock+0x16/0x55
> [  126.060680]  [] ? rcu_read_unlock_special+0x9b/0x1c4
> [  126.060683]  [] rt_mutex_unlock+0x27/0x29
> [  126.060687]  [] rcu_read_unlock_special+0x17e/0x1c4
> [  126.060690]  [] __rcu_read_unlock+0x48/0x89
> [  126.060693]  [] select_task_rq_rt+0xc7/0xd5
> [  126.060696]  [] ? select_task_rq_rt+0x27/0xd5
> [  126.060701]  [] ? clockevents_program_event+0x8e/0x90
> [  126.060704]  [] try_to_wake_up+0x175/0x429
> [  126.060708]  [] ? tick_program_event+0x1f/0x21
> [  126.060711]  [] wake_up_process+0x15/0x17
> [  126.060715]  [] wakeup_softirqd+0x24/0x26
> [  126.060718]  [] irq_exit+0x49/0x55
> [  126.060721]  [] smp_apic_timer_interrupt+0x8a/0x98
> [  126.060724]  [] apic_timer_interrupt+0x13/0x20
> [  126.060726][] ? migrate_disable+0x75/0x12d
> [  126.060733]  [] ? local_bh_disable+0xe/0x1f
> [  126.060736]  [] ? local_bh_disable+0x1d/0x1f
> [  126.060739]  [] irq_forced_thread_fn+0x1b/0x44
> [  126.060742]  [] ? _raw_spin_unlock_irq+0x3b/0x59
> [  126.060745]  [] irq_thread+0xde/0x1af
> [  126.060748]  [] ?

rcu: Merge RCU-bh into RCU-preempt

2018-11-01 Thread Paul E. McKenney

> The Linux kernel has long RCU-bh read-side critical sections that
> intolerably increase scheduling latency under mainline's RCU-bh rules,
> which include RCU-bh read-side critical sections being non-preemptible.
> This patch therefore arranges for RCU-bh to be implemented in terms of
> RCU-preempt for CONFIG_PREEMPT_RT_FULL=y.
> 
> This has the downside of defeating the purpose of RCU-bh, namely,
> handling the case where the system is subjected to a network-based
> denial-of-service attack that keeps at least one CPU doing full-time
> softirq processing.  This issue will be fixed by a later commit.
> 
> The current commit will need some work to make it appropriate for
> mainline use, for example, it needs to be extended to cover Tiny RCU.

The need for this goes away as of the current merge window because
RCU-bh has gone away.  (Aside from still being able to do things
like rcu_read_lock_bh() as a documentation device.)

Thanx, Paul

> [ paulmck: Added a useful changelog ]
> 
> Signed-off-by: Thomas Gleixner 
> Signed-off-by: Paul E. McKenney 
> Link: http://lkml.kernel.org/r/20111005185938.ga20...@linux.vnet.ibm.com
> Signed-off-by: Thomas Gleixner 
> 
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 63cd0a1a99a0..60a9b5feefe2 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -56,7 +56,11 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func);
>  #define  call_rcucall_rcu_sched
>  #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
>  
> +#ifdef CONFIG_PREEMPT_RT_FULL
> +#define call_rcu_bh  call_rcu
> +#else
>  void call_rcu_bh(struct rcu_head *head, rcu_callback_t func);
> +#endif
>  void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
>  void synchronize_sched(void);
>  void rcu_barrier_tasks(void);
> @@ -263,7 +267,14 @@ extern struct lockdep_map rcu_sched_lock_map;
>  extern struct lockdep_map rcu_callback_map;
>  int debug_lockdep_rcu_enabled(void);
>  int rcu_read_lock_held(void);
> +#ifdef CONFIG_PREEMPT_RT_FULL
> +static inline int rcu_read_lock_bh_held(void)
> +{
> + return rcu_read_lock_held();
> +}
> +#else
>  int rcu_read_lock_bh_held(void);
> +#endif
>  int rcu_read_lock_sched_held(void);
>  
>  #else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
> @@ -663,10 +674,14 @@ static inline void rcu_read_unlock(void)
>  static inline void rcu_read_lock_bh(void)
>  {
>   local_bh_disable();
> +#ifdef CONFIG_PREEMPT_RT_FULL
> + rcu_read_lock();
> +#else
>   __acquire(RCU_BH);
>   rcu_lock_acquire(&rcu_bh_lock_map);
>   RCU_LOCKDEP_WARN(!rcu_is_watching(),
>"rcu_read_lock_bh() used illegally while idle");
> +#endif
>  }
>  
>  /*
> @@ -676,10 +691,14 @@ static inline void rcu_read_lock_bh(void)
>   */
>  static inline void rcu_read_unlock_bh(void)
>  {
> +#ifdef CONFIG_PREEMPT_RT_FULL
> + rcu_read_unlock();
> +#else
>   RCU_LOCKDEP_WARN(!rcu_is_watching(),
>"rcu_read_unlock_bh() used illegally while idle");
>   rcu_lock_release(&rcu_bh_lock_map);
>   __release(RCU_BH);
> +#endif
>   local_bh_enable();
>  }
>  
> diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
> index 914655848ef6..462ce061bac7 100644
> --- a/include/linux/rcutree.h
> +++ b/include/linux/rcutree.h
> @@ -44,7 +44,11 @@ static inline void rcu_virt_note_context_switch(int cpu)
>   rcu_note_context_switch(false);
>  }
>  
> +#ifdef CONFIG_PREEMPT_RT_FULL
> +# define synchronize_rcu_bh  synchronize_rcu
> +#else
>  void synchronize_rcu_bh(void);
> +#endif
>  void synchronize_sched_expedited(void);
>  void synchronize_rcu_expedited(void);
>  
> @@ -72,7 +76,11 @@ static inline void synchronize_rcu_bh_expedited(void)
>  }
>  
>  void rcu_barrier(void);
> +#ifdef CONFIG_PREEMPT_RT_FULL
> +# define rcu_barrier_bh  rcu_barrier
> +#else
>  void rcu_barrier_bh(void);
> +#endif
>  void rcu_barrier_sched(void);
>  bool rcu_eqs_special_set(int cpu);
>  unsigned long get_state_synchronize_rcu(void);
> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> index 4d04683c31b2..808cce9a5d43 100644
> --- a/kernel/rcu/rcu.h
> +++ b/kernel/rcu/rcu.h
> @@ -528,7 +528,6 @@ static inline void show_rcu_gp_kthreads(void) { }
>  static inline int rcu_get_gp_kthreads_prio(void) { return 0; }
>  #else /* #ifdef CONFIG_TINY_RCU */
>  unsigned long rcu_get_gp_seq(void);
> -unsigned long rcu_bh_get_gp_seq(void);
>  unsigned long rcu_sched_get_gp_seq(void);
>  unsigned long rcu_exp_batches_completed(void);
>  unsigned long rcu_exp_batches_completed_sched(void);
> @@ -536,10 +535,18 @@ unsigned long srcu_batches_completed(struct srcu_struct 
> *sp);
>  void show_rcu_gp_kthreads(void);
>  int rcu_get_gp_kthreads_prio(void);
>  void rcu_force_quiescent_state(void);
> -void rcu_bh_force_quiescent_state(void);
>  void rcu_sched_force_quiescent_state(void);
>  extern struct workqueue_struct *rcu_gp_wq;
>  extern struct workqueue_stru

rcu: Make ksoftirqd do RCU quiescent states

2018-11-01 Thread Paul E. McKenney

> Implementing RCU-bh in terms of RCU-preempt makes the system vulnerable
> to network-based denial-of-service attacks.  This patch therefore
> makes __do_softirq() invoke rcu_bh_qs(), but only when __do_softirq()
> is running in ksoftirqd context.  A wrapper layer in interposed so that
> other calls to __do_softirq() avoid invoking rcu_bh_qs().  The underlying
> function __do_softirq_common() does the actual work.
> 
> The reason that rcu_bh_qs() is bad in these non-ksoftirqd contexts is
> that there might be a local_bh_enable() inside an RCU-preempt read-side
> critical section.  This local_bh_enable() can invoke __do_softirq()
> directly, so if __do_softirq() were to invoke rcu_bh_qs() (which just
> calls rcu_preempt_qs() in the PREEMPT_RT_FULL case), there would be
> an illegal RCU-preempt quiescent state in the middle of an RCU-preempt
> read-side critical section.  Therefore, quiescent states can only happen
> in cases where __do_softirq() is invoked directly from ksoftirqd.

I -think- that the need for this goes away in the current merge window
because RCU-bh is going away.  There might still be an rt-specific need
to disable irqs, though.

Thanx, Paul

> Signed-off-by: Paul E. McKenney 
> Link: http://lkml.kernel.org/r/20111005184518.ga21...@linux.vnet.ibm.com
> Signed-off-by: Thomas Gleixner 
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 197088cdb56e..968579b86401 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -244,7 +244,19 @@ void rcu_sched_qs(void)
>  this_cpu_ptr(&rcu_sched_data), true);
>  }
>  
> -#ifndef CONFIG_PREEMPT_RT_FULL
> +#ifdef CONFIG_PREEMPT_RT_FULL
> +static void rcu_preempt_qs(void);
> +
> +void rcu_bh_qs(void)
> +{
> + unsigned long flags;
> +
> + /* Callers to this function, rcu_preempt_qs(), must disable irqs. */
> + local_irq_save(flags);
> + rcu_preempt_qs();
> + local_irq_restore(flags);
> +}
> +#else
>  void rcu_bh_qs(void)
>  {
>   RCU_LOCKDEP_WARN(preemptible(), "rcu_bh_qs() invoked with preemption 
> enabled!!!");
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 429a2f144e19..bee9bffeb0ce 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include "../time/tick-internal.h"
> @@ -1407,7 +1408,7 @@ static void rcu_prepare_kthreads(int cpu)
>  
>  #endif /* #else #ifdef CONFIG_RCU_BOOST */
>  
> -#if !defined(CONFIG_RCU_FAST_NO_HZ)
> +#if !defined(CONFIG_RCU_FAST_NO_HZ) || defined(CONFIG_PREEMPT_RT_FULL)
>  
>  /*
>   * Check to see if any future RCU-related work will need to be done
> @@ -1423,7 +1424,9 @@ int rcu_needs_cpu(u64 basemono, u64 *nextevt)
>   *nextevt = KTIME_MAX;
>   return rcu_cpu_has_callbacks(NULL);
>  }
> +#endif /* !defined(CONFIG_RCU_FAST_NO_HZ) || defined(CONFIG_PREEMPT_RT_FULL) 
> */
>  
> +#if !defined(CONFIG_RCU_FAST_NO_HZ)
>  /*
>   * Because we do not have RCU_FAST_NO_HZ, don't bother cleaning up
>   * after it.
> @@ -1520,6 +1523,8 @@ static bool __maybe_unused rcu_try_advance_all_cbs(void)
>   return cbs_ready;
>  }
>  
> +#ifndef CONFIG_PREEMPT_RT_FULL
> +
>  /*
>   * Allow the CPU to enter dyntick-idle mode unless it has callbacks ready
>   * to invoke.  If the CPU has callbacks, try to advance them.  Tell the
> @@ -1562,6 +1567,7 @@ int rcu_needs_cpu(u64 basemono, u64 *nextevt)
>   *nextevt = basemono + dj * TICK_NSEC;
>   return 0;
>  }
> +#endif /* #ifndef CONFIG_PREEMPT_RT_FULL */
>  
>  /*
>   * Prepare a CPU for idle from an RCU perspective.  The first major task

srcu: use cpu_online() instead custom check

2018-11-01 Thread Paul E. McKenney

> The current check via srcu_online is slightly racy because after looking
> at srcu_online there could be an interrupt that interrupted us long
> enough until the CPU we checked against went offline.

I don't see how this can happen, even in -rt.  The call to
srcu_offline_cpu() happens very early in the CPU removal process,
which means that the synchronize_rcu_mult(call_rcu, call_rcu_sched)
in sched_cpu_deactivate() would wait for the interrupt to complete.
And for the enclosing preempt_disable region to complete.

Or is getting rid of that preempt_disable region the real reason for
this change?

> An alternative would be to hold the hotplug rwsem (so the CPUs don't
> change their state) and then check based on cpu_online() if we queue it
> on a specific CPU or not. queue_work_on() itself can handle if something
> is enqueued on an offline CPU but a timer which is enqueued on an offline
> CPU won't fire until the CPU is back online.
> 
> I am not sure if the removal in rcu_init() is okay or not. I assume that
> SRCU won't enqueue a work item before SRCU is up and ready.

That was the case before the current merge window, but use of call_srcu()
by tracing means that SRCU needs to be able to deal with call_srcu()
long before any initialization has happened.  The actual callbacks
won't be invoked until much later, after the scheduler and workqueues
are completely up and running, but call_srcu() can be invoked very early.

But I am not seeing any removal in rcu_init() in this patch, so I might
be missing something.

Thanx, Paul

> Signed-off-by: Sebastian Andrzej Siewior 
> 
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 6c9866a854b1..3428a40a813e 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -38,6 +38,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "rcu.h"
>  #include "rcu_segcblist.h"
> @@ -458,21 +459,6 @@ static void srcu_gp_start(struct srcu_struct *sp)
>   WARN_ON_ONCE(state != SRCU_STATE_SCAN1);
>  }
>  
> -/*
> - * Track online CPUs to guide callback workqueue placement.
> - */
> -DEFINE_PER_CPU(bool, srcu_online);
> -
> -void srcu_online_cpu(unsigned int cpu)
> -{
> - WRITE_ONCE(per_cpu(srcu_online, cpu), true);
> -}
> -
> -void srcu_offline_cpu(unsigned int cpu)
> -{
> - WRITE_ONCE(per_cpu(srcu_online, cpu), false);
> -}
> -
>  /*
>   * Place the workqueue handler on the specified CPU if online, otherwise
>   * just run it whereever.  This is useful for placing workqueue handlers
> @@ -484,12 +470,12 @@ static bool srcu_queue_delayed_work_on(int cpu, struct 
> workqueue_struct *wq,
>  {
>   bool ret;
>  
> - preempt_disable();
> - if (READ_ONCE(per_cpu(srcu_online, cpu)))
> + cpus_read_lock();
> + if (cpu_online(cpu))
>   ret = queue_delayed_work_on(cpu, wq, dwork, delay);
>   else
>   ret = queue_delayed_work(wq, dwork, delay);
> - preempt_enable();
> + cpus_read_unlock();
>   return ret;
>  }
>  
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 6868ef417e9f..e2e68250009b 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3767,8 +3767,6 @@ int rcutree_online_cpu(unsigned int cpu)
>   rnp->ffmask |= rdp->grpmask;
>   raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>   }
> - if (IS_ENABLED(CONFIG_TREE_SRCU))
> - srcu_online_cpu(cpu);
>   if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
>   return 0; /* Too early in boot for scheduler work. */
>   sync_sched_exp_online_cleanup(cpu);
> @@ -3796,8 +3794,6 @@ int rcutree_offline_cpu(unsigned int cpu)
>   }
>  
>   rcutree_affinity_setting(cpu, cpu);
> - if (IS_ENABLED(CONFIG_TREE_SRCU))
> - srcu_offline_cpu(cpu);
>   return 0;
>  }
>

Re: Kernel panic when enabling cgroup2 io controller at runtime

2018-11-01 Thread Nishanth Aravamudan

On 01.11.2018 [12:03:40 -0700], Nishanth Aravamudan wrote:
> Hi,
> 
> tl;dr: I see a kernel NULL pointer dereference with Linus' master
> (7c6c54b5) when enabling the IO cgroup2 controller at runtime. Is this
> PEBKAC and if so what config option am I missing?

Actually, this might be totally unrelated to my cgroup testing, and just
happened to be exacerbated by it? Adding LKML to the CC, preserving the
prior oops below and pasting another oops I just got after waiting a bit
during a normal boot.

[   38.450985] BUG: unable to handle kernel NULL pointer dereference at 

[   38.458879] PGD 0 P4D 0 
[   38.461444] Oops:  [#1] SMP PTI
[   38.464964] CPU: 27 PID: 2159 Comm: auditd Kdump: loaded Tainted: G  
 O  4.19.0+ #3
[   38.473713] Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 1.2.11 
10/19/2017
[   38.481298] RIP: 0010:get_request+0x133/0x8b0
[   38.485674] Code: ff ff ff 41 f7 d4 48 89 85 78 ff ff ff 4c 01 f8 41 83 c4 
02 48 89 45 90 44 89 a5 74 ff ff ff 4d 8b 27 48 85 db 49 8b 44 24 18 <48> 8b 00 
48 89 855
[   38.504489] RSP: 0018:b59e5c3bb9c0 EFLAGS: 00010086
[   38.509722] RAX:  RBX: a0424bd78e00 RCX: 0001
[   38.516888] RDX: 355bbf83dbb0 RSI: 0800 RDI: a041eb1a6c80
[   38.524047] RBP: b59e5c3bba68 R08: 0060 R09: 9fe264871360
[   38.531188] R10: b59e5c3bbb28 R11: 1000 R12: 9fe2635d9360
[   38.538340] R13: 0001 R14: 0040 R15: a041eb1a6c40
[   38.545490] FS:  7fec3109c700() GS:a0427f54() 
knlGS:
[   38.553618] CS:  0010 DS:  ES:  CR0: 80050033
[   38.559381] CR2:  CR3: 00beaf27a002 CR4: 007606e0
[   38.566524] DR0:  DR1:  DR2: 
[   38.573680] DR3:  DR6: fffe0ff0 DR7: 0400
[   38.580830] PKRU: 5554
[   38.583543] Call Trace:
[   38.586013]  ? wait_woken+0x80/0x80
[   38.589543]  blk_queue_bio+0x131/0x460
[   38.593304]  generic_make_request+0x1a4/0x410
[   38.597673]  raid10_unplug+0x112/0x1b0 [raid10]
[   38.602211]  ? raid10_unplug+0x112/0x1b0 [raid10]
[   38.606927]  blk_flush_plug_list+0xce/0x250
[   38.611123]  blk_finish_plug+0x2c/0x40
[   38.614892]  ext4_writepages+0x635/0xe90
[   38.618837]  do_writepages+0x4b/0xe0
[   38.622424]  ? ext4_mark_inode_dirty+0x1d0/0x1d0
[   38.627068]  ? do_writepages+0x4b/0xe0
[   38.630838]  ? call_rcu+0x10/0x20
[   38.634168]  ? inode_switch_wbs+0x15d/0x190
[   38.638363]  __filemap_fdatawrite_range+0xc1/0x100
[   38.643161]  ? __filemap_fdatawrite_range+0xc1/0x100
[   38.648137]  file_write_and_wait_range+0x5a/0xb0
[   38.652767]  ext4_sync_file+0x111/0x3b0
[   38.656611]  vfs_fsync_range+0x48/0x80
[   38.660375]  ? __fget_light+0x54/0x60
[   38.664049]  do_fsync+0x3d/0x70
[   38.667203]  __x64_sys_fsync+0x14/0x20
[   38.670965]  do_syscall_64+0x5a/0x120
[   38.674639]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   38.679710] RIP: 0033:0x7fec320eeb07
[   38.683764] Code: 00 00 0f 05 48 3d 00 f0 ff ff 77 3f f3 c3 0f 1f 44 00 00 
53 89 fb 48 83 ec 10 e8 04 f5 ff ff 89 df 89 c2 b8 4a 00 00 00 0f 05 <48> 3d 00 
f0 ff ff4
[   38.703360] RSP: 002b:7fec3109be40 EFLAGS: 0293 ORIG_RAX: 
004a
[   38.711331] RAX: ffda RBX: 0005 RCX: 7fec320eeb07
[   38.718882] RDX:  RSI:  RDI: 0005
[   38.726428] RBP:  R08:  R09: 
[   38.733936] R10:  R11: 0293 R12: 7fec3109bfc0
[   38.741467] R13:  R14:  R15: 7ffdfe1da3e0
[   38.749025] Modules linked in: ebtable_filter ebtables ip6table_filter 
iptable_filter nbd vport_stt(O) openvswitch(O) nf_nat_ipv6 nf_nat_ipv4 
nf_conncount nf_nat u0
[   38.749064]  multipath linear mlx5_ib raid1 raid10 ses enclosure 
scsi_transport_sas ib_uverbs ib_core mlx5_core mgag200 i2c_algo_bit mlxfw ttm 
devlink drm_kms_helpi
[   38.861715] CR2: 
[0.061107] do_IRQ: 0.35 No irq handler for vector
[0.103225] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d 
is b0)
[2.894501] scsi 0:0:32:0: Wrong diagnostic page; asked for 10 got 0
[3.263453] Out of memory: Kill process 222 (systemd-udevd) score 10 or 
sacrifice child
[3.271482] Killed process 222 (systemd-udevd) total-vm:26388kB, 
anon-rss:1244kB, file-rss:3088kB, shmem-rss:0kB
[3.325928] Out of memory: Kill process 225 (systemd-udevd) score 10 or 
sacrifice child
[3.333960] Killed process 386 (mdadm) total-vm:7236kB, anon-rss:120kB, 
file-rss:1788kB, shmem-rss:0kB
[3.981778] Out of memory: Kill process 450 (loadkeys) score 5 or sacrifice 
child
[3.989311] Killed process 450 (loadkeys) total-vm:4708kB, anon-rss:272kB, 
file-rss:1780kB, shmem-rss:0kB
[3.999073] Out of memory: Kill process 422 (console_setup) sco

[PATCH] Make JFFS2 endianness configurable

2018-11-01 Thread Nikunj Kela

This patch allows the endianness of the JFSS2 filesystem to be
specified by config options.

It defaults to native-endian (the previously hard-coded option).

Some architectures benefit from having a single known endianness
of JFFS2 filesystem (for data, not executables) independent of the
endianness of the processor (ARM processors can be switched to either
endianness at run-time).

This patch is taken from:
http://www.infradead.org/pipermail/linux-mtd/2006-January/014717.html

Cc: xe-linux-exter...@cisco.com
Signed-off-by: Rod Whitby 
Signed-off-by: Nikunj Kela 
---
 fs/jffs2/Kconfig| 25 +
 fs/jffs2/nodelist.h |  8 +++-
 2 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/fs/jffs2/Kconfig b/fs/jffs2/Kconfig
index ad850c5bf2ca..86e93fbc9d74 100644
--- a/fs/jffs2/Kconfig
+++ b/fs/jffs2/Kconfig
@@ -182,3 +182,28 @@ config JFFS2_CMODE_FAVOURLZO
  decompression) at the expense of size.
 
 endchoice
+
+choice
+   prompt "JFFS2 endianness"
+   default JFFS2_NATIVE_ENDIAN
+   depends on JFFS2_FS
+   help
+ You can set here the default endianness of JFFS2 from
+ the available options. Do not touch if unsure.
+
+config JFFS2_NATIVE_ENDIAN
+   bool "native endian"
+   help
+ Uses a native endian bytestream.
+
+config JFFS2_BIG_ENDIAN
+   bool "big endian"
+   help
+ Uses a big endian bytestream.
+
+config JFFS2_LITTLE_ENDIAN
+   bool "little endian"
+   help
+ Uses a little endian bytestream.
+
+endchoice
diff --git a/fs/jffs2/nodelist.h b/fs/jffs2/nodelist.h
index 0637271f3770..a1ebf04f217c 100644
--- a/fs/jffs2/nodelist.h
+++ b/fs/jffs2/nodelist.h
@@ -27,12 +27,10 @@
 #include "os-linux.h"
 #endif
 
-#define JFFS2_NATIVE_ENDIAN
-
 /* Note we handle mode bits conversion from JFFS2 (i.e. Linux) to/from
whatever OS we're actually running on here too. */
 
-#if defined(JFFS2_NATIVE_ENDIAN)
+#if defined(CONFIG_JFFS2_NATIVE_ENDIAN)
 #define cpu_to_je16(x) ((jint16_t){x})
 #define cpu_to_je32(x) ((jint32_t){x})
 #define cpu_to_jemode(x) ((jmode_t){os_to_jffs2_mode(x)})
@@ -43,7 +41,7 @@
 #define je16_to_cpu(x) ((x).v16)
 #define je32_to_cpu(x) ((x).v32)
 #define jemode_to_cpu(x) (jffs2_to_os_mode((x).m))
-#elif defined(JFFS2_BIG_ENDIAN)
+#elif defined(CONFIG_JFFS2_BIG_ENDIAN)
 #define cpu_to_je16(x) ((jint16_t){cpu_to_be16(x)})
 #define cpu_to_je32(x) ((jint32_t){cpu_to_be32(x)})
 #define cpu_to_jemode(x) ((jmode_t){cpu_to_be32(os_to_jffs2_mode(x))})
@@ -54,7 +52,7 @@
 #define je16_to_cpu(x) (be16_to_cpu(x.v16))
 #define je32_to_cpu(x) (be32_to_cpu(x.v32))
 #define jemode_to_cpu(x) (be32_to_cpu(jffs2_to_os_mode((x).m)))
-#elif defined(JFFS2_LITTLE_ENDIAN)
+#elif defined(CONFIG_JFFS2_LITTLE_ENDIAN)
 #define cpu_to_je16(x) ((jint16_t){cpu_to_le16(x)})
 #define cpu_to_je32(x) ((jint32_t){cpu_to_le32(x)})
 #define cpu_to_jemode(x) ((jmode_t){cpu_to_le32(os_to_jffs2_mode(x))})
-- 
2.19.1

[RFC 0/2] Add RISC-V cpu topology

2018-11-01 Thread Atish Patra

This patch series adds the cpu topology for RISC-V. It contains
both the DT binding and actual source code. It has been tested on
QEMU & Unleashed board. 

The idea is based on cpu-map in ARM with changes related to how
we define SMT systems. The reason for adopting a similar approach
to ARM as I feel it provides a very clear way of defining the
topology compared to parsing cache nodes to figure out which cpus
share the same package or core.  I am open to any other idea to
implement cpu-topology as well.

Atish Patra (2):
  dt-bindings: topology: Add RISC-V cpu topology.
  RISC-V: Introduce cpu topology.

 .../devicetree/bindings/riscv/topology.txt | 154 
 arch/riscv/include/asm/topology.h  |  28 +++
 arch/riscv/kernel/Makefile |   1 +
 arch/riscv/kernel/smpboot.c|   5 +-
 arch/riscv/kernel/topology.c   | 194 +
 5 files changed, 381 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/devicetree/bindings/riscv/topology.txt
 create mode 100644 arch/riscv/include/asm/topology.h
 create mode 100644 arch/riscv/kernel/topology.c

-- 
2.7.4

[RFC 2/2] RISC-V: Introduce cpu topology.

2018-11-01 Thread Atish Patra

Currently, cpu topology is not defined for RISC-V.

Parse cpu-topology from a new DT entry "cpu-topology"
to create different cpu sibling maps.
As of now, only bare minimum requirements are implemented
but it is capable of describing any type of topology in future.

CPU topology after applying the patch.
$cat /sys/devices/system/cpu/cpu2/topology/core_siblings_list
0-3
$cat /sys/devices/system/cpu/cpu3/topology/core_siblings_list
0-3
$cat /sys/devices/system/cpu/cpu3/topology/physical_package_id
0
$cat /sys/devices/system/cpu/cpu3/topology/core_id
3

Signed-off-by: Atish Patra 
---
 arch/riscv/include/asm/topology.h |  28 ++
 arch/riscv/kernel/Makefile|   1 +
 arch/riscv/kernel/smpboot.c   |   5 +-
 arch/riscv/kernel/topology.c  | 194 ++
 4 files changed, 227 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/include/asm/topology.h
 create mode 100644 arch/riscv/kernel/topology.c

diff --git a/arch/riscv/include/asm/topology.h 
b/arch/riscv/include/asm/topology.h
new file mode 100644
index ..d412edc8
--- /dev/null
+++ b/arch/riscv/include/asm/topology.h
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#ifndef __ASM_TOPOLOGY_H
+#define __ASM_TOPOLOGY_H
+
+#include 
+#include 
+
+struct riscv_cpu_topology {
+   int core_id;
+   int package_id;
+   int hart_id;
+   cpumask_t thread_sibling;
+   cpumask_t core_sibling;
+};
+
+extern struct riscv_cpu_topology cpu_topology[NR_CPUS];
+
+#define topology_physical_package_id(cpu)  (cpu_topology[cpu].package_id)
+#define topology_core_id(cpu)  (cpu_topology[cpu].core_id)
+#define topology_core_cpumask(cpu) (&cpu_topology[cpu].core_sibling)
+#define topology_sibling_cpumask(cpu)  (&cpu_topology[cpu].thread_sibling)
+
+void init_cpu_topology(void);
+void remove_cpu_topology(unsigned int cpuid);
+void set_topology_masks(unsigned int cpuid);
+
+#endif /* _ASM_RISCV_TOPOLOGY_H */
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index e1274fc0..128766f8 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -27,6 +27,7 @@ obj-y += riscv_ksyms.o
 obj-y  += stacktrace.o
 obj-y  += vdso.o
 obj-y  += cacheinfo.o
+obj-y  += topology.o
 obj-y  += vdso/
 
 CFLAGS_setup.o := -mcmodel=medany
diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
index 56abab6a..1324f4b2 100644
--- a/arch/riscv/kernel/smpboot.c
+++ b/arch/riscv/kernel/smpboot.c
@@ -45,6 +45,7 @@ void __init smp_prepare_boot_cpu(void)
 
 void __init smp_prepare_cpus(unsigned int max_cpus)
 {
+   init_cpu_topology();
 }
 
 void __init setup_smp(void)
@@ -98,13 +99,15 @@ void __init smp_cpus_done(unsigned int max_cpus)
 asmlinkage void __init smp_callin(void)
 {
struct mm_struct *mm = &init_mm;
+   int cpu = smp_processor_id();
 
/* All kernel threads share the same mm context.  */
atomic_inc(&mm->mm_count);
current->active_mm = mm;
 
trap_init();
-   notify_cpu_starting(smp_processor_id());
+   notify_cpu_starting(cpu);
+   set_topology_masks(cpu);
set_cpu_online(smp_processor_id(), 1);
local_flush_tlb_all();
local_irq_enable();
diff --git a/arch/riscv/kernel/topology.c b/arch/riscv/kernel/topology.c
new file mode 100644
index ..5195de14
--- /dev/null
+++ b/arch/riscv/kernel/topology.c
@@ -0,0 +1,194 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2018 Western Digital Corporation or its affiliates.
+ *
+ * Based on the arm64 version arch/arm64/kernel/topology.c
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+/*
+ * cpu topology array
+ */
+struct riscv_cpu_topology cpu_topology[NR_CPUS];
+EXPORT_SYMBOL_GPL(cpu_topology);
+
+void set_topology_masks(unsigned int cpuid)
+{
+   struct riscv_cpu_topology *ctopo, *cpuid_topo = &cpu_topology[cpuid];
+   int cpu;
+
+   /* update core and thread sibling masks */
+   for_each_online_cpu(cpu) {
+   ctopo = &cpu_topology[cpu];
+
+   if (cpuid_topo->package_id != ctopo->package_id)
+   continue;
+
+   cpumask_set_cpu(cpuid, &ctopo->core_sibling);
+   cpumask_set_cpu(cpu, &cpuid_topo->core_sibling);
+
+   if (cpuid_topo->core_id != ctopo->core_id)
+   continue;
+
+   cpumask_set_cpu(cpuid, &ctopo->thread_sibling);
+   cpumask_set_cpu(cpu, &cpuid_topo->thread_sibling);
+   }
+}
+
+static int __init get_hartid_for_cnode(struct device_node *node,
+  unsigned int count)
+{
+   char name[10];
+   struct device_node *cpu_node;
+   int cpu;
+
+   snprintf(name, sizeof(name), "cpu%d", count);
+   cpu_node = of_parse_phandle(node, name, 0);
+   if (!cpu_node)
+   return -1;
+
+   cpu = of_cpu_node_to_id(cpu_node);
+   if (cpu < 0)
+   pr_err("Unable

[RFC 1/2] dt-bindings: topology: Add RISC-V cpu topology.

2018-11-01 Thread Atish Patra

Define a RISC-V cpu topology. This is based on cpu-map in ARM world.
But it doesn't need a separate thread node for defining SMT systems.
Multiple cpu phandle properties can be parsed to identify the sibling
hardware threads. Moreover, we do not have cluster concept in RISC-V.
So package is a better word choice than cluster for RISC-V.

Signed-off-by: Atish Patra 
---
 .../devicetree/bindings/riscv/topology.txt | 154 +
 1 file changed, 154 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/riscv/topology.txt

diff --git a/Documentation/devicetree/bindings/riscv/topology.txt 
b/Documentation/devicetree/bindings/riscv/topology.txt
new file mode 100644
index ..96039ed3
--- /dev/null
+++ b/Documentation/devicetree/bindings/riscv/topology.txt
@@ -0,0 +1,154 @@
+===
+RISC-V cpu topology binding description
+===
+
+===
+1 - Introduction
+===
+
+In a RISC-V system, the hierarchy of CPUs can be defined through following 
nodes that
+are used to describe the layout of physical CPUs in the system:
+
+- packages
+- core
+
+The cpu nodes (bindings defined in [1]) represent the devices that
+correspond to physical CPUs and are to be mapped to the hierarchy levels.
+Simultaneous multi-threading (SMT) systems can also represent their topology
+by defining multiple cpu phandles inside core node. The details are explained
+in paragraph 3.
+
+The remainder of this document provides the topology bindings for ARM, based
+on the Devicetree Specification, available from:
+
+https://www.devicetree.org/specifications/
+
+If not stated otherwise, whenever a reference to a cpu node phandle is made its
+value must point to a cpu node compliant with the cpu node bindings as
+documented in [1].
+A topology description containing phandles to cpu nodes that are not compliant
+with bindings standardized in [1] is therefore considered invalid.
+
+This cpu topology binding description is mostly based on the topology defined
+in ARM [2].
+===
+2 - cpu-topology node
+===
+
+The RISC-V CPU topology is defined within the "cpu-topology" node, which is a 
direct
+child of the "cpus" node and provides a container where the actual topology
+nodes are listed.
+
+- cpu-topology node
+
+   Usage: Optional - RISC-V SMP systems need to provide CPUs topology to
+ the OS. RISC-V uniprocessor systems do not require a
+ topology description and therefore should not define a
+ cpu-topology node.
+
+   Description: The cpu-topology node is just a container node where its
+subnodes describe the CPU topology.
+
+   Node name must be "cpu-topology".
+
+   The cpu-topology node's parent node must be the cpus node.
+
+   The cpu-topology node's child nodes can be:
+
+   - one or more package nodes
+
+   Any other configuration is considered invalid.
+
+The cpu-topology node can only contain two types of child nodes:
+
+- package node
+- core node
+
+whose bindings are described in paragraph 3.
+
+=
+2.1 - cpu-topology child nodes naming convention
+=
+
+cpu-topology child nodes must follow a naming convention where the node name
+must be "packageN", "coreN" depending on the node type (i.e. package/core).
+For SMT systems, coreN node can contain several cpuN to indicate individual
+SMT harts (where N = {0, 1, ...} is the node number; nodes which are siblings
+within a single common parent node must be given a unique and sequential N
+value, starting from 0). cpu-topology child nodes which do not share a common
+parent node can have the same name (i.e. same number N as other cpu-topology
+child nodes at different device tree levels) since name uniqueness will be
+guaranteed by the device tree hierarchy.
+
+===
+3 - package/core node bindings
+===
+
+Bindings for package/core nodes are defined as follows:
+
+- package node
+
+Description: must be declared within a cpu-topology node, one node
+ per package. A system can contain several layers of
+ package nodes. It can also be contained in parent
+ package nodes.
+
+   The package node name must be "packageN" as described in 2.1 above.
+   A package node can not be a leaf node.
+
+   A package node's child nodes must be:
+
+   - one or more package nodes; or
+   - one or more core nodes
+
+   Any other configuration is considered invalid.
+
+- core node
+
+   Description: must be declared in a package node, one node per core in
+

rcu: enable rcu_normal_after_boot by default for RT

2018-11-01 Thread Paul E. McKenney

> The forcing of an expedited grace period is an expensive and very
> RT-application unfriendly operation, as it forcibly preempts all running
> tasks on CPUs which are preventing the gp from expiring.
> 
> By default, as a policy decision, disable the expediting of grace
> periods (after boot) on configurations which enable PREEMPT_RT_FULL.
> 
> Suggested-by: Luiz Capitulino 
> Signed-off-by: Julia Cartwright 
> Signed-off-by: Sebastian Andrzej Siewior 

In case it matters:

Acked-by: Paul E. McKenney 

Alternatively, any reason that I should not pull this into -rcu?

Thanx, Paul

> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> index f56c0fbdf22e..12027723abaf 100644
> --- a/kernel/rcu/update.c
> +++ b/kernel/rcu/update.c
> @@ -67,7 +67,7 @@ extern int rcu_expedited; /* from sysctl */
>  module_param(rcu_expedited, int, 0);
>  extern int rcu_normal; /* from sysctl */
>  module_param(rcu_normal, int, 0);
> -static int rcu_normal_after_boot;
> +static int rcu_normal_after_boot = IS_ENABLED(CONFIG_PREEMPT_RT_FULL);
>  module_param(rcu_normal_after_boot, int, 0);
>  #endif /* #ifndef CONFIG_TINY_RCU */
>

Re: [PATCH] arm64: kdump: fix small typo

2018-11-01 Thread Will Deacon

On Thu, Nov 01, 2018 at 11:25:31AM -0400, Yangtao Li wrote:
> Signed-off-by: Yangtao Li 
> ---
>  arch/arm64/kernel/crash_dump.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
> index f46d57c31443..6b5037ed15b2 100644
> --- a/arch/arm64/kernel/crash_dump.c
> +++ b/arch/arm64/kernel/crash_dump.c
> @@ -58,7 +58,7 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
>  /**
>   * elfcorehdr_read - read from ELF core header
>   * @buf: buffer where the data is placed
> - * @csize: number of bytes to read
> + * @count: number of bytes to read

I know this is trivial, but please can you include a short commit message
saying that this brings the kerneldoc in line with the function signature?

Thanks,

Will

Miss Aminata musa ibrahim from Libya

2018-11-01 Thread Miss Amina musa ibrahim





--
Miss Aminata musa ibrahim from Libya, I am 22 years old, I am in St. 
Christopher's Parish for refugee in Burkina Faso under United Nations 
High commission for Refugee ,I lost my parents in the recent war in 
Libya, right now am in Burkina Faso, please save my life i am in danger 
need your help in transferring my inheritance my father left behind for 
me in a Bank in Burkina Faso here,i have every document for the 
transfer,Emai:missaminatamu...@gmail.com

Miss Aminata musa ibrahim.
--

Re: [GIT PULL] security: keys updates for v4.20

2018-11-01 Thread Linus Torvalds

On Fri, Oct 26, 2018 at 2:36 AM James Morris  wrote:
>
> From David: "Provide five new operations in the key_type struct that can
> be used to provide access to asymmetric key operations.  These will be
> implemented for the asymmetric key type in a later patch and may refer to
> a key retained in RAM by the kernel or a key retained in crypto hardware.

Pulled.

However. I really would have expected some of the TPM people to have
acked these, or at least looked at them. There was no sign of that in
any of the tpm commits that I could see..

Hmm?

   Linus

Re: [PATCH 4.9 23/35] x86/mm: Expand static page table for fixmap space

2018-11-01 Thread Ben Hutchings

On Thu, 2018-10-11 at 17:35 +0200, Greg Kroah-Hartman wrote:
> 4.9-stable review patch.  If anyone has any objections, please let me know.
> 
> --
> 
> From: Feng Tang 
> 
> commit 05ab1d8a4b36ee912b7087c6da127439ed0a903e upstream.

This backport is incorrect.  The part that updated __startup_64() in
arch/x86/kernel/head64.c was dropped, presumably because that function
doesn't exist in 4.9.  However that seems to be an essential of the
fix.  In 4.9 the startup_64 routine in arch/x86/kernel/head_64.S would
need to be changed instead.

I also found that this introduces new boot-time warnings on some
systems if CONFIG_DEBUG_WX is enabled.

So, unless someone provides fixes for those issues, I think this should
be reverted for the 4.9 branch.

Ben.

> We met a kernel panic when enabling earlycon, which is due to the fixmap
> address of earlycon is not statically setup.
> 
> Currently the static fixmap setup in head_64.S only covers 2M virtual
> address space, while it actually could be in 4M space with different
> kernel configurations, e.g. when VSYSCALL emulation is disabled.
> 
> So increase the static space to 4M for now by defining FIXMAP_PMD_NUM to 2,
> and add a build time check to ensure that the fixmap is covered by the
> initial static page tables.
> 
> Fixes: 1ad83c858c7d ("x86_64,vsyscall: Make vsyscall emulation configurable")
> Suggested-by: Thomas Gleixner 
> Signed-off-by: Feng Tang 
> Signed-off-by: Thomas Gleixner 
> Tested-by: kernel test robot 
> Reviewed-by: Juergen Gross  (Xen parts)
> Cc: H Peter Anvin 
> Cc: Peter Zijlstra 
> Cc: Michal Hocko 
> Cc: Yinghai Lu 
> Cc: Dave Hansen 
> Cc: Andi Kleen 
> Cc: Andy Lutomirsky 
> Cc: sta...@vger.kernel.org
> Link: https://lkml.kernel.org/r/20180920025828.23699-1-feng.t...@intel.com
> Signed-off-by: Greg Kroah-Hartman 
> ---
>  arch/x86/include/asm/fixmap.h |   10 ++
>  arch/x86/include/asm/pgtable_64.h |3 ++-
>  arch/x86/kernel/head_64.S |   16 
>  arch/x86/mm/pgtable.c |9 +
>  arch/x86/xen/mmu.c|8 ++--
>  5 files changed, 39 insertions(+), 7 deletions(-)
> 
> --- a/arch/x86/include/asm/fixmap.h
> +++ b/arch/x86/include/asm/fixmap.h
> @@ -14,6 +14,16 @@
>  #ifndef _ASM_X86_FIXMAP_H
>  #define _ASM_X86_FIXMAP_H
>  
> +/*
> + * Exposed to assembly code for setting up initial page tables. Cannot be
> + * calculated in assembly code (fixmap entries are an enum), but is sanity
> + * checked in the actual fixmap C code to make sure that the fixmap is
> + * covered fully.
> + */
> +#define FIXMAP_PMD_NUM   2
> +/* fixmap starts downwards from the 507th entry in level2_fixmap_pgt */
> +#define FIXMAP_PMD_TOP   507
> +
>  #ifndef __ASSEMBLY__
>  #include 
>  #include 
> --- a/arch/x86/include/asm/pgtable_64.h
> +++ b/arch/x86/include/asm/pgtable_64.h
> @@ -13,13 +13,14 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  extern pud_t level3_kernel_pgt[512];
>  extern pud_t level3_ident_pgt[512];
>  extern pmd_t level2_kernel_pgt[512];
>  extern pmd_t level2_fixmap_pgt[512];
>  extern pmd_t level2_ident_pgt[512];
> -extern pte_t level1_fixmap_pgt[512];
> +extern pte_t level1_fixmap_pgt[512 * FIXMAP_PMD_NUM];
>  extern pgd_t init_level4_pgt[];
>  
>  #define swapper_pg_dir init_level4_pgt
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -23,6 +23,7 @@
>  #include "../entry/calling.h"
>  #include 
>  #include 
> +#include 
>  
>  #ifdef CONFIG_PARAVIRT
>  #include 
> @@ -493,13 +494,20 @@ NEXT_PAGE(level2_kernel_pgt)
>   KERNEL_IMAGE_SIZE/PMD_SIZE)
>  
>  NEXT_PAGE(level2_fixmap_pgt)
> - .fill   506,8,0
> - .quad   level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
> - /* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
> - .fill   5,8,0
> + .fill   (512 - 4 - FIXMAP_PMD_NUM),8,0
> + pgtno = 0
> + .rept (FIXMAP_PMD_NUM)
> + .quad level1_fixmap_pgt + (pgtno << PAGE_SHIFT) - __START_KERNEL_map \
> + + _PAGE_TABLE;
> + pgtno = pgtno + 1
> + .endr
> + /* 6 MB reserved space + a 2MB hole */
> + .fill   4,8,0
>  
>  NEXT_PAGE(level1_fixmap_pgt)
> + .rept (FIXMAP_PMD_NUM)
>   .fill   512,8,0
> + .endr
>  
>  #undef PMDS
>  
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -536,6 +536,15 @@ void __native_set_fixmap(enum fixed_addr
>  {
>   unsigned long address = __fix_to_virt(idx);
>  
> +#ifdef CONFIG_X86_64
> +   /*
> + * Ensure that the static initial page tables are covering the
> + * fixmap completely.
> + */
> + BUILD_BUG_ON(__end_of_permanent_fixed_addresses >
> +  (FIXMAP_PMD_NUM * PTRS_PER_PTE));
> +#endif
> +
>   if (idx >= __end_of_fixed_addresses) {
>   BUG();
>   return;
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1936,7 +1936,7 @@ void __init xen_setup_kernel_pagetable(p
>* L3_k[511] -> level2_fixm

Re: [RFC PATCH for 4.21 03/16] mm: Replace BUG_ON() by WARN_ON() in vm_unmap_ram()

2018-11-01 Thread Mathieu Desnoyers

- On Nov 1, 2018, at 11:00 PM, Linus Torvalds torva...@linux-foundation.org 
wrote:

> On Thu, Nov 1, 2018 at 12:57 PM Mathieu Desnoyers
>  wrote:
>>
>> > I think the graceful recovery is to simply return:
>> >
>> >   if (WARN_ON(cond))
>> >   return;
>> >
>> > is better than just
>> >
>> >   BUG_ON(cond);
>> >
>> > As that's what Linus made pretty clear at the Maintainer's Summit.
>>
>> That's it. For an unmap function, this basically boils down to
>> print a warning and leak the memory on internal unmap error.
>>
>> I will update the commit message describing this behavior.
> 
> It might be even better to use WARN_ON_ONCE().
> 
> If it's a "this shouldn't happen" situation, the advantage of
> WARN_ON_ONCE() is that it will still show the backtrace of the "how
> the heck did it happen after all" situation, but if it turns ouit to
> be user-triggerable (or simply triggerable by some odd hw situation),
> it won't spam your logs forever.
> 
> Obviously, things like rate limiting etc can also be good ideas, but
> that's just overkill for "this really should never happen" cases.
> 
> (Side note: WARN_ON_ONCE() will _warn_ just once, but will always
> return the condition that it warns for, so the return value is _not_
> "I have warned", but "I have seen the condition that I should warn
> about". Just in case people are worried about it).

Allright, I'll update this patch (and the following one implementing
vm_{map,unmap}_user_ram) to use WARN_ON_ONCE().

Thanks!

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]

2018-11-01 Thread Milian Wolff

On Dienstag, 30. Oktober 2018 23:34:35 CET Milian Wolff wrote:
> On Mittwoch, 24. Oktober 2018 16:48:18 CET Andi Kleen wrote:
> > > Can someone at least confirm whether unwinding from a function prologue
> > > via
> > > .eh_frame (but without .debug_frame) should actually be possible?
> > 
> > Yes it should be possible. Asynchronous unwind tables should work
> > from any instruction.



> We can find `7f91345bdaf8+1 = 7f91345bdaf9" at offset 16 (search for "f9 da
> 5b 34 91 7f"). Using that address makes unwinding work for this sample.
> What could be the reason for this shift?

I believe I have found the culprit: PEBS seems to be at fault here - i.e. the 
RIP/RSP and the ustack dump of the sample simply don't fit together.

Check this out:

```
$ for i in $(seq 10); do perf record -q -e "cycles:" --call-graph dwarf ./cpp-
inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\
[unknown\]"; done
0
0
0
0
0
0
0
0
0
0

$ for i in $(seq 10); do perf record -q -e "cycles:p" --call-graph dwarf ./
cpp-inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\
[unknown\]"; done
0
0
0
0
0
0
0
0
0
0

$ for i in $(seq 10); do perf record -q -e "cycles:pp" --call-graph dwarf ./
cpp-inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\
[unknown\]"; done
37
39
35
28
40
39
29
37
31
26

$ for i in $(seq 10); do perf record -q -e "cycles:ppp" --call-graph dwarf ./
cpp-inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\
[unknown\]"; done
79
70
76
77
70
90
64
78
86
74
```

Note how precise levels 0 and 1 do not produce any samples where unwinding 
fails. But precise level 2 produces some, and precise level 3 increases the 
amount (by ca. ~2x).

I can reproduce this pattern on two separate Intel CPUs and kernel versions 
currently:

Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz with 4.18.16-arch1-1-ARCH
Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 4.14.78-1-lts

Could someone else try this? What about AMD and IBS - is it also affected? 
What about newer/different Intel CPUs?

Better yet, can someone come up with a fix for this on Intel with maximum 
precise level?

Thanks

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

smime.p7s
Description: S/MIME cryptographic signature

Re: [git pull] mount API series

2018-11-01 Thread Linus Torvalds

On Thu, Nov 1, 2018 at 3:05 PM Al Viro  wrote:
>
>  Do you mind if we end up with work.mount rebased?
> The usual objections re testing in -next do not apply in this case,
> AFAICS...

I was assuming that the work.mount branch would be entirely re-done, yes.

  Linus

Re: [RFC PATCH for 4.21 03/16] mm: Replace BUG_ON() by WARN_ON() in vm_unmap_ram()

2018-11-01 Thread Linus Torvalds

On Thu, Nov 1, 2018 at 12:57 PM Mathieu Desnoyers
 wrote:
>
> > I think the graceful recovery is to simply return:
> >
> >   if (WARN_ON(cond))
> >   return;
> >
> > is better than just
> >
> >   BUG_ON(cond);
> >
> > As that's what Linus made pretty clear at the Maintainer's Summit.
>
> That's it. For an unmap function, this basically boils down to
> print a warning and leak the memory on internal unmap error.
>
> I will update the commit message describing this behavior.

It might be even better to use WARN_ON_ONCE().

If it's a "this shouldn't happen" situation, the advantage of
WARN_ON_ONCE() is that it will still show the backtrace of the "how
the heck did it happen after all" situation, but if it turns ouit to
be user-triggerable (or simply triggerable by some odd hw situation),
it won't spam your logs forever.

Obviously, things like rate limiting etc can also be good ideas, but
that's just overkill for "this really should never happen" cases.

(Side note: WARN_ON_ONCE() will _warn_ just once, but will always
return the condition that it warns for, so the return value is _not_
"I have warned", but "I have seen the condition that I should warn
about". Just in case people are worried about it).

  Linus

Re: [PATCH 01/25] amd-gpu: Don't undefine READ and WRITE [ver #2]

2018-11-01 Thread Pavel Machek

On Wed 2018-10-24 00:57:50, David Howells wrote:
> Remove the undefinition of READ and WRITE because these constants may be
> used elsewhere in subsequently included header files, thus breaking them.
> 
> These constants don't actually appear to be used in the driver, so the
> undefinition seems pointless.
> 
> Fixes: 4562236b3bc0 ("drm/amd/dc: Add dc display driver (v2)")
> Signed-off-by: David Howells 
> ---
> 
>  drivers/gpu/drm/amd/display/dc/os_types.h |2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/os_types.h 
> b/drivers/gpu/drm/amd/display/dc/os_types.h
> index a407892905af..c0d9f332baed 100644
> --- a/drivers/gpu/drm/amd/display/dc/os_types.h
> +++ b/drivers/gpu/drm/amd/display/dc/os_types.h
> @@ -40,8 +40,6 @@
>  #define LITTLEENDIAN_CPU
>  #endif
>  
> -#undef READ
> -#undef WRITE
>  #undef FRAME_SIZE
>  

While you are at it... is undefining FRAME_SIZE good idea? It seems
like this is another bug waiting to be discovered.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [git pull] mount API series

2018-11-01 Thread Al Viro

On Thu, Nov 01, 2018 at 11:33:31AM -0700, Linus Torvalds wrote:

> Al - can I ask you to look at helping David with something like that?
> You tend to be very good at generating those patch-series with
> "obviously no changes" for the individual patches, but the end result
> ends up being totally different from the starting point (I'm thinking
> of all the locking and dentry refcounting series).

I'll try.  Before we go there, I'd like to get the rest of vfs.git off
my hands - AFS series and misc pile.  Will send pull requests shortly,
then - this stuff.  Do you mind if we end up with work.mount rebased?
The usual objections re testing in -next do not apply in this case,
AFAICS...

Re: [PATCH 0/8] OLPC 1.75 Keyboard/Touchpad fixes

2018-11-01 Thread Pavel Machek

Hi!

> This makes keyboard/touchpad work on a DT MMP2 platform.

For the series:

Acked-by: Pavel Machek 
Tested-by: Pavel Machek 

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [GIT PULL] overlayfs update for 4.20

2018-11-01 Thread Linus Torvalds

On Thu, Nov 1, 2018 at 2:06 PM Miklos Szeredi  wrote:
>
> This contains a mix of fixes and cleanups.

Pulled,

 Linus

Re: [GIT PULL] UBIFS updates for 4.20-rc1

2018-11-01 Thread Richard Weinberger

Linus,

On Wed, Oct 31, 2018 at 10:22 PM Richard Weinberger  wrote:
> The following changes since commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d:
>
>   Linux 4.19 (2018-10-22 07:37:37 +0100)
>
> are available in the Git repository at:
>
>   git://git.infradead.org/linux-ubifs.git tags/tags/upstream-4.20-rc1

Just a kind ping to make sure that this pull request didn't get lost
in your spam folder.
Since you pulled the UML stuff,but this not, I'm a little worried.

-- 
Thanks,
//richard

Re: [PATCH anybus v2 1/5] misc: support the Arcx anybus bridge.

2018-11-01 Thread Linus Walleij

On Thu, Nov 1, 2018 at 6:17 PM Sven Van Asbroeck  wrote:

> >> +static DEVICE_ATTR_RO(version);
> >
> > Do you need this in userspace really?
> >
> >> +static DEVICE_ATTR_RO(design_number);
> >
> > And this?
>
> Unfortunately, I do :(
> The application software reads these out and displays them in an UI. It's
> important to be able to see these on a running device.

OK...

> Perhaps there is another kernel abstraction I could use?

I don't think so. If you want to be pedantic, document the sysfs
files in Documentation/ABI/testing/sysfs-*

Maybe the properties should be on the bus though? I don't know :/

Yours,
Linus Walleij

Re: [Ksummit-discuss] Call to Action Re: [PATCH 0/7] Code of Conduct: Fix some wording, and add an interpretation document

2018-11-01 Thread NeilBrown

On Thu, Nov 01 2018, Paul E. McKenney wrote:

> On Sat, Oct 27, 2018 at 02:10:10AM +0100, Josh Triplett wrote:
>> On Fri, Oct 26, 2018 at 08:14:51AM +1100, NeilBrown wrote:
>> > On Wed, Oct 24 2018, Josh Triplett wrote:
>> > 
>> > > On Tue, Oct 23, 2018 at 07:26:06AM +1100, NeilBrown wrote:
>> > >> On Sun, Oct 21 2018, Josh Triplett wrote:
>> > >> 
>> > >> > On Mon, Oct 22, 2018 at 08:20:11AM +1100, NeilBrown wrote:
>> > >> >> I call on you, Greg:
>> > >> >>  - to abandon this divisive attempt to impose a "Code of Conduct"
>> > >> >>  - to revert 8a104f8b5867c68
>> > >> >>  - to return to your core competence of building a great team around
>> > >> >>a great kernel
>> > >> >> 
>> > >> >>  #Isupportreversion
>> > >> >> 
>> > >> >> I call on the community to consider what *does* need to be said, 
>> > >> >> about
>> > >> >> conduct, to people outside the community and who have recently 
>> > >> >> joined.
>> > >> >> What is the document that you would have liked to have read as you 
>> > >> >> were
>> > >> >> starting out?  It is all too long ago for me to remember clearly, 
>> > >> >> and so
>> > >> >> much has changed.
>> > >> >
>> > >> > The document I would have liked to have read when starting out is
>> > >> > currently checked into the source tree in
>> > >> > Documentation/process/code-of-conduct.rst .
>> > >> 
>> > >> I'm curious - what would you have gained by reading that document?
>> > >
>> > > I would have then had rather less of a pervasive feeling of "if I make
>> > > even a single mistake I get made an example of in ways that will feed
>> > > people's quotes files for years to come".
>> > 
>> > Thanks for your reply.  Certainly feeling safe is important, and having
>> > clear statements that the community values and promotes psychological
>> > safety is valuable.
>> > 
>> > The old "code of conflict" said
>> >If however, anyone feels personally abused, threatened, or otherwise
>> >uncomfortable due to this process, that is not acceptable. 
>> > 
>> > would you have not found this a strong enough statement to ward off that
>> > pervasive feeling?
>> 
>> Not when that document started out effectively saying, in an elaborate
>> way, "code > people".
>
> Interesting.
>
> I am curious what leads you to your "code > people" statement.  Of course,
> one could argue that this does not really matter given that the code of
> conflict is no longer.  However, I would like to understand for future
> reference, if for no other reason.
>
> One possibility is that you are restricting the "people" to only those
> people directly contributing in one way or another.  But those using the
> kernel (both directly and indirectly) are important as well, and it is
> exactly this group that is served by "the most robust operating system
> kernel ever", the chest-beating sentiment notwithstanding.  Which is in
> fact why I must reject (or rework or whatever) any patch that might result
> in too-short RCU grace periods:  The needs of the patch's submitter are
> quite emphatically outweighed by the needs of the kernel's many users,
> and many of the various technical requirements and restrictions are in
> fact proxies for the needs of these users.
>
> But you knew that already.
>
> Similarly for the Linux kernel's various code-style strictures, which
> serve the surprisingly large group of people reading the kernel's code.
> Including the stricture that I most love to hate, which is the one
> stating that single-line do/for/if/while statements must not be enclosed
> in braces, which sometimes causes me trouble when inserting debug code,
> but which also makes more code fit into a window of a given size.  ;-)
>
> But you knew that already, too.
>
> The maintainability requirements can be argued to mostly serve the
> maintainers, but if the code becomes unmaintainable, future users
> will be inconvenienced, to say the least.  So even the maintainability
> requirements serve the kernel's many users.
>
> But you also knew that already.
>
> So what am I missing here?
>

Hi Paul,
 thanks for contributing your thoughts.  It is nice to have a new voice
 in the conversation, it helps me to maintain my illusion that this
 issue is relevant to the whole community.

 I cannot, of course, speak to why Josh wrote what he did, but I can
 give some insight into why I had no disagreement with that part of his
 statement.
 A key insight, worth your time to consider and unpack I think, is

  People won't care what you know, until they know that you care.

 I won't dwell on that here, but will make some more obviously relevant
 observations.

 Firstly, you gave an analytical response to what was, in my view, an
 emotional observation.  While I agree with your analysis, it is largely
 irrelevant.  It is not how people *feel* about kernel development.

 You say that the code of conflict is gone, but in fact much of it is
 preserved in the code-of-conduct-interpretation.  If you reflect on the
 focus of the second para of that d

[RFC PATCH 2/6] shiftfs: map inodes to lower fs inodes instead of dentries

2018-11-01 Thread Seth Forshee

Since shiftfs inodes map to dentries in the lower fs, two links
to the same lowerfs inode create separate inodes in shiftfs. This
causes problems for inotify, as a watch on one of these files in
shiftfs will not see changes made to the underlying inode via the
other file.

Fix this by updating shiftfs to map its inodes to corresponding
inodes in the lower fs. Inodes are cached using the pointer to
the lower fs inode as the hash value. This fixes a second inotify
problem whereby a watch is set on an inode, the dentry is evicted
from the cache, and events on a new dentry are not reported back
to the watch original inode.

Signed-off-by: Seth Forshee 
---
 fs/shiftfs.c | 105 ++-
 1 file changed, 79 insertions(+), 26 deletions(-)

diff --git a/fs/shiftfs.c b/fs/shiftfs.c
index 6028244c2f42..b179a1be7bc1 100644
--- a/fs/shiftfs.c
+++ b/fs/shiftfs.c
@@ -22,6 +22,7 @@ struct shiftfs_super_info {
 
 static struct inode *shiftfs_new_inode(struct super_block *sb, umode_t mode,
   struct dentry *dentry);
+static void shiftfs_init_inode(struct inode *inode, umode_t mode);
 
 enum {
OPT_MARK,
@@ -278,15 +279,27 @@ static void shiftfs_fill_inode(struct inode *inode, 
struct dentry *dentry)
inode->i_opflags |= IOP_NOFOLLOW;
 
inode->i_mapping = reali->i_mapping;
-   inode->i_private = dentry;
+   inode->i_private = reali;
+   set_nlink(inode, reali->i_nlink);
+}
+
+static int shiftfs_inode_test(struct inode *inode, void *data)
+{
+   return inode->i_private == data;
+}
+
+static int shiftfs_inode_set(struct inode *inode, void *data)
+{
+   inode->i_private = data;
+   return 0;
 }
 
 static int shiftfs_make_object(struct inode *dir, struct dentry *dentry,
   umode_t mode, const char *symlink,
   struct dentry *hardlink, bool excl)
 {
-   struct dentry *real = dir->i_private, *new = dentry->d_fsdata;
-   struct inode *reali = real->d_inode, *newi;
+   struct dentry *new = dentry->d_fsdata;
+   struct inode *reali = dir->i_private, *inode, *newi;
const struct inode_operations *iop = reali->i_op;
int err;
const struct cred *oldcred, *newcred;
@@ -310,9 +323,14 @@ static int shiftfs_make_object(struct inode *dir, struct 
dentry *dentry,
return -EINVAL;
 
 
-   newi = shiftfs_new_inode(dentry->d_sb, mode, NULL);
-   if (!newi)
-   return -ENOMEM;
+   if (hardlink) {
+   inode = d_inode(hardlink);
+   ihold(inode);
+   } else {
+   inode = shiftfs_new_inode(dentry->d_sb, mode, NULL);
+   if (!inode)
+   return -ENOMEM;
+   }
 
oldcred = shiftfs_new_creds(&newcred, dentry->d_sb);
 
@@ -341,16 +359,33 @@ static int shiftfs_make_object(struct inode *dir, struct 
dentry *dentry,
if (err)
goto out_dput;
 
-   shiftfs_fill_inode(newi, new);
+   if (hardlink) {
+   WARN_ON(inode->i_private != new->d_inode);
+   inc_nlink(inode);
+   } else {
+   shiftfs_fill_inode(inode, new);
+
+   newi = inode_insert5(inode, (unsigned long)new->d_inode,
+shiftfs_inode_test, shiftfs_inode_set,
+new->d_inode);
+   if (newi != inode) {
+   pr_warn_ratelimited("shiftfs: newly created inode found 
in cache\n");
+   iput(inode);
+   inode = newi;
+   }
+   }
+
+   if (inode->i_state & I_NEW)
+   unlock_new_inode(inode);
 
-   d_instantiate(dentry, newi);
+   d_instantiate(dentry, inode);
 
new = NULL;
-   newi = NULL;
+   inode = NULL;
 
  out_dput:
dput(new);
-   iput(newi);
+   iput(inode);
inode_unlock(reali);
 
return err;
@@ -386,8 +421,8 @@ static int shiftfs_symlink(struct inode *dir, struct dentry 
*dentry,
 
 static int shiftfs_rm(struct inode *dir, struct dentry *dentry, bool rmdir)
 {
-   struct dentry *real = dir->i_private, *new = dentry->d_fsdata;
-   struct inode *reali = real->d_inode;
+   struct dentry *new = dentry->d_fsdata;
+   struct inode *reali = dir->i_private;
int err;
const struct cred *oldcred, *newcred;
 
@@ -400,6 +435,13 @@ static int shiftfs_rm(struct inode *dir, struct dentry 
*dentry, bool rmdir)
else
err = vfs_unlink(reali, new, NULL);
 
+   if (!err) {
+   if (rmdir)
+   clear_nlink(d_inode(dentry));
+   else
+   drop_nlink(d_inode(dentry));
+   }
+
shiftfs_old_creds(oldcred, &newcred);
inode_unlock(reali);
 
@@ -420,7 +462,8 @@ static int shiftfs_rename(struct inode *olddir, struct 
dentry *old,

[RFC PATCH 3/6] shiftfs: copy inode attrs up from underlying fs

2018-11-01 Thread Seth Forshee

Not all inode permission checks go through the permission
callback, e.g. some checks related to file capabilities. Always
copy up the inode attrs to ensure these checks work as expected.

Also introduce helpers helpers for shifting kernel ids from one
user ns to another, as this is an operation that is going to be
repeated.

Signed-off-by: Seth Forshee 
---
 fs/shiftfs.c | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/fs/shiftfs.c b/fs/shiftfs.c
index b179a1be7bc1..556594988dd2 100644
--- a/fs/shiftfs.c
+++ b/fs/shiftfs.c
@@ -266,6 +266,33 @@ static int shiftfs_xattr_set(const struct xattr_handler 
*handler,
return shiftfs_setxattr(dentry, inode, name, value, size, flags);
 }
 
+static kuid_t shift_kuid(struct user_namespace *from, struct user_namespace 
*to,
+kuid_t kuid)
+{
+   uid_t uid = from_kuid(from, kuid);
+   return make_kuid(to, uid);
+}
+
+static kgid_t shift_kgid(struct user_namespace *from, struct user_namespace 
*to,
+kgid_t kgid)
+{
+   gid_t gid = from_kgid(from, kgid);
+   return make_kgid(to, gid);
+}
+
+static void shiftfs_copyattr(struct inode *from, struct inode *to)
+{
+   struct user_namespace *from_ns = from->i_sb->s_user_ns;
+   struct user_namespace *to_ns = to->i_sb->s_user_ns;
+
+   to->i_uid = shift_kuid(from_ns, to_ns, from->i_uid);
+   to->i_gid = shift_kgid(from_ns, to_ns, from->i_gid);
+   to->i_mode = from->i_mode;
+   to->i_atime = from->i_atime;
+   to->i_mtime = from->i_mtime;
+   to->i_ctime = from->i_ctime;
+}
+
 static void shiftfs_fill_inode(struct inode *inode, struct dentry *dentry)
 {
struct inode *reali;
@@ -278,6 +305,7 @@ static void shiftfs_fill_inode(struct inode *inode, struct 
dentry *dentry)
if (!reali->i_op->get_link)
inode->i_opflags |= IOP_NOFOLLOW;
 
+   shiftfs_copyattr(reali, inode);
inode->i_mapping = reali->i_mapping;
inode->i_private = reali;
set_nlink(inode, reali->i_nlink);
@@ -573,7 +601,7 @@ static int shiftfs_setattr(struct dentry *dentry, struct 
iattr *attr)
return err;
 
/* all OK, reflect the change on our inode */
-   setattr_copy(d_inode(dentry), attr);
+   shiftfs_copyattr(reali, d_inode(dentry));
return 0;
 }
 
-- 
2.19.1

[RFC PATCH 4/6] shiftfs: translate uids using s_user_ns from lower fs

2018-11-01 Thread Seth Forshee

Do not assume that ids from the lower filesystem are from
init_user_ns. Instead, translate them from that filesystem's
s_user_ns and then to the shiftfs user ns.

Signed-off-by: Seth Forshee 
---
 fs/shiftfs.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/shiftfs.c b/fs/shiftfs.c
index 556594988dd2..226c03d8588b 100644
--- a/fs/shiftfs.c
+++ b/fs/shiftfs.c
@@ -613,6 +613,8 @@ static int shiftfs_getattr(const struct path *path, struct 
kstat *stat,
struct inode *reali = real->d_inode;
const struct inode_operations *iop = reali->i_op;
struct path newpath = { .mnt = path->dentry->d_sb->s_fs_info, .dentry = 
real };
+   struct user_namespace *from_ns = reali->i_sb->s_user_ns;
+   struct user_namespace *to_ns = inode->i_sb->s_user_ns;
int err = 0;
 
if (iop->getattr)
@@ -624,8 +626,8 @@ static int shiftfs_getattr(const struct path *path, struct 
kstat *stat,
return err;
 
/* transform the underlying id */
-   stat->uid = make_kuid(inode->i_sb->s_user_ns, __kuid_val(stat->uid));
-   stat->gid = make_kgid(inode->i_sb->s_user_ns, __kgid_val(stat->gid));
+   stat->uid = shift_kuid(from_ns, to_ns, stat->uid);
+   stat->gid = shift_kgid(from_ns, to_ns, stat->gid);
return 0;
 }
 
-- 
2.19.1

[RFC PATCH 5/6] shiftfs: add support for posix acls

2018-11-01 Thread Seth Forshee

Signed-off-by: Seth Forshee 
---
 fs/Kconfig   |  10 +++
 fs/shiftfs.c | 185 +++
 2 files changed, 195 insertions(+)

diff --git a/fs/Kconfig b/fs/Kconfig
index 392c5a41a9f9..691f3c4fc7eb 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -121,6 +121,16 @@ config SHIFT_FS
  unprivileged containers can use this to mount root volumes
  using this technique.
 
+config SHIFT_FS_POSIX_ACL
+   bool "shiftfs POSIX Access Control Lists"
+   depends on SHIFT_FS
+   select FS_POSIX_ACL
+   help
+ POSIX Access Control Lists (ACLs) support permissions for users and
+ groups beyond the owner/group/world scheme.
+
+ If you don't know what Access Control Lists are, say N.
+
 menu "Caches"
 
 source "fs/fscache/Kconfig"
diff --git a/fs/shiftfs.c b/fs/shiftfs.c
index 226c03d8588b..b19af7b2fe75 100644
--- a/fs/shiftfs.c
+++ b/fs/shiftfs.c
@@ -13,6 +13,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 struct shiftfs_super_info {
struct vfsmount *mnt;
@@ -631,6 +633,183 @@ static int shiftfs_getattr(const struct path *path, 
struct kstat *stat,
return 0;
 }
 
+#ifdef CONFIG_SHIFT_FS_POSIX_ACL
+
+static int
+shift_acl_ids(struct user_namespace *from, struct user_namespace *to,
+ struct posix_acl *acl)
+{
+   int i;
+
+   for (i = 0; i < acl->a_count; i++) {
+   struct posix_acl_entry *e = &acl->a_entries[i];
+   switch(e->e_tag) {
+   case ACL_USER:
+   e->e_uid = shift_kuid(from, to, e->e_uid);
+   if (!uid_valid(e->e_uid))
+   return -EOVERFLOW;
+   break;
+   case ACL_GROUP:
+   e->e_gid = shift_kgid(from, to, e->e_gid);
+   if (!gid_valid(e->e_gid))
+   return -EOVERFLOW;
+   break;
+   }
+   }
+   return 0;
+}
+
+static void
+shift_acl_xattr_ids(struct user_namespace *from, struct user_namespace *to,
+   void *value, size_t size)
+{
+   struct posix_acl_xattr_header *header = value;
+   struct posix_acl_xattr_entry *entry = (void *)(header + 1), *end;
+   int count;
+   kuid_t kuid;
+   kgid_t kgid;
+
+   if (!value)
+   return;
+   if (size < sizeof(struct posix_acl_xattr_header))
+   return;
+   if (header->a_version != cpu_to_le32(POSIX_ACL_XATTR_VERSION))
+   return;
+
+   count = posix_acl_xattr_count(size);
+   if (count < 0)
+   return;
+   if (count == 0)
+   return;
+
+   for (end = entry + count; entry != end; entry++) {
+   switch(le16_to_cpu(entry->e_tag)) {
+   case ACL_USER:
+   kuid = make_kuid(&init_user_ns, 
le32_to_cpu(entry->e_id));
+   kuid = shift_kuid(from, to, kuid);
+   entry->e_id = cpu_to_le32(from_kuid(&init_user_ns, 
kuid));
+   break;
+   case ACL_GROUP:
+   kgid = make_kgid(&init_user_ns, 
le32_to_cpu(entry->e_id));
+   kgid = shift_kgid(from, to, kgid);
+   entry->e_id = cpu_to_le32(from_kgid(&init_user_ns, 
kgid));
+   break;
+   default:
+   break;
+   }
+   }
+}
+
+static struct posix_acl *shiftfs_get_acl(struct inode *inode, int type)
+{
+   struct inode *reali = inode->i_private;
+   const struct cred *oldcred, *newcred;
+   struct posix_acl *real_acl, *acl = NULL;
+   struct user_namespace *from_ns = reali->i_sb->s_user_ns;
+   struct user_namespace *to_ns = inode->i_sb->s_user_ns;
+   int size;
+   int err;
+
+   if (!IS_POSIXACL(reali))
+   return NULL;
+
+   oldcred = shiftfs_new_creds(&newcred, inode->i_sb);
+   real_acl = get_acl(reali, type);
+   shiftfs_old_creds(oldcred, &newcred);
+
+   if (real_acl && !IS_ERR(acl)) {
+   /* XXX: export posix_acl_clone? */
+   size = sizeof(struct posix_acl) +
+  real_acl->a_count * sizeof(struct posix_acl_entry);
+   acl = kmemdup(acl, size, GFP_KERNEL);
+   posix_acl_release(real_acl);
+
+   if (!acl)
+   return ERR_PTR(-ENOMEM);
+
+   refcount_set(&acl->a_refcount, 1);
+
+   err = shift_acl_ids(from_ns, to_ns, acl);
+   if (err) {
+   kfree(acl);
+   return ERR_PTR(err);
+   }
+   }
+
+   return acl;
+}
+
+static int
+shiftfs_posix_acl_xattr_get(const struct xattr_handler *handler,
+  struct dentry *dentry, struct inode *inode,
+  const char *name, void *buffer, size_t size)
+{
+   struct in

1 2 3 4 5 6 >

1 - 100 of 509 matches

Mail list logo