Re: [PATCH 14/15] ARM: pxa: change SSP devices allocation

2018-04-04 Thread Robert Jarzmik
Arnd Bergmann  writes:

> I'm still unable to follow through that code, but I understand now that
> the device you pass to dma_request_slave_channel() is not the one
> we'd like it to be here.
>
> Where exactly does that call to dma_request_chan() happen? Is this
> the one in dmaengine_pcm_new()? Could we perhaps put a
> pointer to the SSP device into snd_dmaengine_dai_dma_data?
This is a sample stack I captured with an added WARN_ON(1), triggered by a
userland "aplay Sultans_Of_Swing.wav" :)

[  299.216743] [] (unwind_backtrace) from [] 
(show_stack+0x20/0x24)
[  299.223986] [] (show_stack) from [] 
(dump_stack+0x20/0x28)
[  299.231321] [] (dump_stack) from [] (__warn+0xf0/0x11c)
[  299.238183] [] (__warn) from [] 
(warn_slowpath_null+0x4c/0x58)
[  299.245234] [] (warn_slowpath_null) from [] 
(dma_request_chan+0x40/0x228)
[  299.252550] [] (dma_request_chan) from [] 
(dma_request_slave_channel+0x18/0x24)
[  299.259855] [] (dma_request_slave_channel) from [] 
(__pxa2xx_pcm_open+0xf4/0x110)
[  299.266789] [] (__pxa2xx_pcm_open) from [] 
(soc_pcm_open+0xf8/0x9c8)
[  299.273932] [] (soc_pcm_open) from [] 
(snd_pcm_open_substream+0x9c/0x134)
[  299.281290] [] (snd_pcm_open_substream) from [] 
(snd_pcm_open+0xbc/0x22c)
[  299.288255] [] (snd_pcm_open) from [] 
(snd_pcm_playback_open+0x50/0x88)
[  299.295468] [] (snd_pcm_playback_open) from [] 
(snd_open+0x124/0x144)
[  299.302897] [] (snd_open) from [] 
(chrdev_open+0x1a0/0x1f0)
[  299.310296] [] (chrdev_open) from [] 
(do_dentry_open.constprop.0+0x1d4/0x31c)
[  299.317345] [] (do_dentry_open.constprop.0) from [] 
(vfs_open+0x7c/0x80)
[  299.324597] [] (vfs_open) from [] 
(path_openat+0xbe8/0xf90)
[  299.332003] [] (path_openat) from [] 
(do_filp_open+0x80/0xe4)
[  299.339044] [] (do_filp_open) from [] 
(do_sys_open+0x148/0x1f8)
[  299.346225] [] (do_sys_open) from [] (SyS_open+0x2c/0x30)
[  299.353505] [] (SyS_open) from [] 
(ret_fast_syscall+0x0/0x28)

Cheers.

-- 
Robert


Re: [PATCH 00/15] ARM: pxa: switch to DMA slave maps

2018-04-04 Thread Arnd Bergmann
On Thu, Apr 5, 2018 at 8:29 AM, Ulf Hansson  wrote:
> On 4 April 2018 at 21:56, Boris Brezillon  wrote:
>> On Wed, 04 Apr 2018 21:49:26 +0200
>> Robert Jarzmik  wrote:
>>
>>> Ulf Hansson  writes:
>>>
>>> > On 2 April 2018 at 16:26, Robert Jarzmik  wrote:
>>> >> Hi,
>>> >>
>>> >> This serie is aimed at removing the dmaengine slave compat use, and 
>>> >> transfer
>>> >> knowledge of the DMA requestors into architecture code.
>>> >> As this looks like a patch bomb, each maintainer expressing for his tree 
>>> >> either
>>> >> an Ack or "I want to take through my tree" will be spared in the next 
>>> >> iterations
>>> >> of this serie.
>>> >
>>> > Perhaps an option is to send this hole series as PR for 3.17 rc1, that
>>> > would removed some churns and make this faster/easier? Well, if you
>>> > receive the needed acks of course.
>>> For 3.17-rc1 it looks a bit optimistic with the review time ... If I have 
>>> all
>>
>> Especially since 3.17-rc1 has been released more than 3 years ago :-),
>> but I guess you meant 4.17-rc1.
>
> Yeah, I realize that I was a bit lost in time yesterday. Even more
> people have been having fun about it (me too). :-)

I occasionally still type 2.6.17 when I mean 4.17.

   Arnd


Re: sched_rt_period_timer causing large latencies

2018-04-04 Thread Mike Galbraith
On Thu, 2018-04-05 at 09:11 +1000, Nicholas Piggin wrote:
> Hi,
> 
> I'm seeing some pretty big latencies on a ~idle system when a CPU wakes
> out of a nohz idle. Looks like it's due to the taking a lot of remote
> locks and cache lines. irqoff trace:
> 
> latency: 407 us, #608/608, CPU#3 | (M:server VP:0, KP:0, SP:0 HP:0 #P:176)
> 
>   -0   3d...0us : decrementer_common
>   -0   3d...1us : timer_interrupt <-decrementer_common
>   -0   3d...2us : irq_enter <-timer_interrupt
>   -0   3d...2us : rcu_irq_enter <-irq_enter
>   -0   3d...3us : rcu_nmi_enter <-irq_enter
>   -0   3d...4us : rcu_dynticks_eqs_exit <-rcu_nmi_enter
>   -0   3d...4us : __local_bh_disable_ip <-irq_enter
>   -0   3d...5us : tick_irq_enter <-irq_enter
>   -0   3d...6us : tick_check_oneshot_broadcast_this_cpu 
> <-tick_irq_enter
>   -0   3d...6us : ktime_get <-tick_irq_enter
>   -0   3d...7us : update_ts_time_stats <-tick_irq_enter
>   -0   3d...8us : nr_iowait_cpu <-update_ts_time_stats
>   -0   3d...9us : _local_bh_enable <-irq_enter
>   -0   3d...   10us : __local_bh_enable <-irq_enter
>   -0   3d.h.   10us : __timer_interrupt <-timer_interrupt
>   -0   3d.h.   11us : hrtimer_interrupt <-__timer_interrupt
>   -0   3d.h.   12us : _raw_spin_lock_irqsave <-hrtimer_interrupt
>   -0   3d.h.   13us : ktime_get_update_offsets_now 
> <-hrtimer_interrupt
>   -0   3d.h.   13us : __hrtimer_run_queues <-hrtimer_interrupt
>   -0   3d.h.   14us : __remove_hrtimer <-__hrtimer_run_queues
>   -0   3d.h.   15us : _raw_spin_unlock_irqrestore 
> <-__hrtimer_run_queues
>   -0   3d.h.   16us : tick_sched_timer <-__hrtimer_run_queues
>   -0   3d.h.   16us : ktime_get <-tick_sched_timer
>   -0   3d.h.   17us : tick_sched_do_timer <-tick_sched_timer
>   -0   3d.h.   18us : tick_sched_handle.isra.5 <-tick_sched_timer
>   -0   3d.h.   19us : update_process_times 
> <-tick_sched_handle.isra.5
>   -0   3d.h.   19us : account_process_tick <-update_process_times
>   -0   3d.h.   20us : run_local_timers <-update_process_times
>   -0   3d.h.   21us : hrtimer_run_queues <-run_local_timers
>   -0   3d.h.   21us : raise_softirq <-run_local_timers
>   -0   3d.h.   22us : __raise_softirq_irqoff <-raise_softirq
>   -0   3d.h.   23us : rcu_check_callbacks <-update_process_times
>   -0   3d.h.   24us : rcu_sched_qs <-rcu_check_callbacks
>   -0   3d.h.   25us : rcu_bh_qs <-rcu_check_callbacks
>   -0   3d.h.   25us : rcu_segcblist_ready_cbs <-rcu_check_callbacks
>   -0   3d.h.   26us : cpu_needs_another_gp <-rcu_check_callbacks
>   -0   3d.h.   27us : invoke_rcu_core <-rcu_check_callbacks
>   -0   3d.h.   28us : raise_softirq <-invoke_rcu_core
>   -0   3d.h.   28us : __raise_softirq_irqoff <-raise_softirq
>   -0   3d.h.   29us : scheduler_tick <-update_process_times
>   -0   3d.h.   30us : _raw_spin_lock <-scheduler_tick
>   -0   3d.h.   31us : update_rq_clock <-scheduler_tick
>   -0   3d.h.   31us : task_tick_idle <-scheduler_tick
>   -0   3d.h.   32us : cpu_load_update_active <-scheduler_tick
>   -0   3d.h.   33us : tick_nohz_tick_stopped 
> <-cpu_load_update_active
>   -0   3d.h.   33us : cpu_load_update <-scheduler_tick
>   -0   3d.h.   34us : sched_avg_update <-cpu_load_update
>   -0   3d.h.   35us : calc_global_load_tick <-scheduler_tick
>   -0   3d.h.   36us : trigger_load_balance <-scheduler_tick
>   -0   3d.h.   36us : raise_softirq <-trigger_load_balance
>   -0   3d.h.   37us : __raise_softirq_irqoff <-raise_softirq
>   -0   3d.h.   38us : run_posix_cpu_timers <-update_process_times
>   -0   3d.h.   39us : profile_tick <-tick_sched_handle.isra.5
>   -0   3d.h.   39us : hrtimer_forward <-tick_sched_timer
>   -0   3d.h.   40us : _raw_spin_lock_irq <-__hrtimer_run_queues
>   -0   3d.h.   41us : enqueue_hrtimer <-__hrtimer_run_queues
>   -0   3d.h.   42us : __remove_hrtimer <-__hrtimer_run_queues
>   -0   3d.h.   42us : _raw_spin_unlock_irqrestore 
> <-__hrtimer_run_queues
>   -0   3d.h.   44us : sched_rt_period_timer <-__hrtimer_run_queues
>   -0   3d.h.   44us : _raw_spin_lock <-sched_rt_period_timer
>   -0   3d.h.   45us : ktime_get <-sched_rt_period_timer
>   -0   3d.h.   46us : hrtimer_forward <-sched_rt_period_timer
>   -0   3d.h.   47us : _raw_spin_lock <-sched_rt_period_timer
>   -0   3d.h.   48us : _raw_spin_lock <-sched_rt_period_timer
>   ... 527 more locks snipped ...
>   -0   3d.h.  403us : ktime_get <-sched_rt_period_timer
>   -0   3d.h.  403us : hrtimer_forward <-sched_rt_period_timer
>   -0   3d.h.  404us : _raw_spin_lock_irq <-__hrtimer_run_queues
>   -0   3d.h.  404us : __hrtimer_get_next_event <-hrtimer_interrupt
>   -0   3d.h.  404us : __hrtimer_next_event_base 
> <-__hrtimer_get_next_event
>   -0   3d.h.  405us : __hrtimer_ne

Re: [PATCH v5 1/3] regulator: axp20x: add drivevbus support for axp803

2018-04-04 Thread Maxime Ripard
On Thu, Apr 05, 2018 at 12:11:39PM +0530, Jagan Teki wrote:
> On Tue, Mar 27, 2018 at 11:01 AM, Jagan Teki  
> wrote:
> > Like axp221, axp223, axp813 the axp803 is also supporting external
> > regulator to drive the  OTG VBus through N_VBUSEN PMIC pin.
> >
> > Add support for it.
> >
> > Signed-off-by: Jagan Teki 
> > Reviewed-by: Rob Herring 
> > Reviewed-by: Chen-Yu Tsai 
> > ---
> > Changes for v5:
> > - Collect Chen-Yu reviewed-by tag
> > Changes for v4:
> > - rebase on master
> > Changes for v3:
> > - Update drivevbus in table of regulators
> 
> Can you pick these, has some dependency with drivevbus on other
> patches.

I'm not the regulator maintainer, nor the AXP maintainer for that
matter. Mark Brown and Chen-Yu are, respectively.

Maxime

-- 
Maxime Ripard, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com


signature.asc
Description: PGP signature


Re: [PATCH v5 1/3] regulator: axp20x: add drivevbus support for axp803

2018-04-04 Thread Jagan Teki
On Tue, Mar 27, 2018 at 11:01 AM, Jagan Teki  wrote:
> Like axp221, axp223, axp813 the axp803 is also supporting external
> regulator to drive the  OTG VBus through N_VBUSEN PMIC pin.
>
> Add support for it.
>
> Signed-off-by: Jagan Teki 
> Reviewed-by: Rob Herring 
> Reviewed-by: Chen-Yu Tsai 
> ---
> Changes for v5:
> - Collect Chen-Yu reviewed-by tag
> Changes for v4:
> - rebase on master
> Changes for v3:
> - Update drivevbus in table of regulators

Can you pick these, has some dependency with drivevbus on other patches.

Jagan.


Re: WARNING: kobject bug in sysfs_warn_dup

2018-04-04 Thread Greg KH
On Wed, Apr 04, 2018 at 07:02:01PM -0700, syzbot wrote:
> Hello,
> 
> syzbot hit the following crash on upstream commit
> 3e968c9f1401088abc9a19ae6ff571644d37a355 (Wed Apr 4 21:19:24 2018 +)
> Merge tag 'ext4_for_linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=ff87a28e665c163aa7f5
> 
> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5104666266304512
> syzkaller reproducer:
> https://syzkaller.appspot.com/x/repro.syz?id=5683447737614336
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=5104818200772608
> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=9118669095563550941
> compiler: gcc (GCC) 7.1.1 20170620
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+ff87a28e665c163aa...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
> 
> R10:  R11: 0286 R12: 0003
> R13: 0004 R14:  R15: 
> [ cut here ]
> kobject_add_internal failed for nodev( with -EEXIST, don't try to register
> things with the same name in the same directory.
> sysfs: cannot create duplicate filename '/fs/gfs2/nodev('
> WARNING: CPU: 1 PID: 4473 at lib/kobject.c:238
> kobject_add_internal+0x8d4/0xbc0 lib/kobject.c:235
> CPU: 0 PID: 4474 Comm: syzkaller533472 Not tainted 4.16.0+ #15
> Kernel panic - not syncing: panic_on_warn set ...
> 
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x1a7/0x27d lib/dump_stack.c:53
>  sysfs_warn_dup+0x83/0xa0 fs/sysfs/dir.c:30
>  sysfs_create_dir_ns+0x178/0x1d0 fs/sysfs/dir.c:58
>  create_dir lib/kobject.c:69 [inline]
>  kobject_add_internal+0x335/0xbc0 lib/kobject.c:227
>  kobject_add_varg lib/kobject.c:364 [inline]
>  kobject_init_and_add+0xf9/0x150 lib/kobject.c:436
>  gfs2_sys_fs_add+0x1ff/0x580 fs/gfs2/sys.c:652
>  fill_super+0x86f/0x1d70 fs/gfs2/ops_fstype.c:1118
>  gfs2_mount+0x587/0x6e0 fs/gfs2/ops_fstype.c:1321

gfs2 bug, not a sysfs bug, we are correctly warning about an incorrect
usage of the api.

Now if we should turn this into a non-WARN message, that's a different
thing, I'll gladly take a patch for that.

thanks,

greg k-h


Re: [GIT PULL] USB/PHY driver patches for 4.17-rc1

2018-04-04 Thread Kees Cook
On Wed, Apr 4, 2018 at 3:31 AM, Greg KH  wrote:
> Lars-Peter Clausen (2):
>   usb: gadget: ffs: Execute copy_to_user() with USER_DS set

https://git.kernel.org/linus/4058ebf33cb0be88ca516f968eda24ab7b6b93e4

Isn't there a better way to do this without the set_fs() usage? We've
been try to eliminate it in the kernel. I thought there was a safer
way to use iters now?

-Kees

-- 
Kees Cook
Pixel Security


linux-next: manual merge of the akpm tree with the nvdimm tree

2018-04-04 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm tree got a conflict in:

  fs/dax.c

between commit:

  fab6964f22b9 ("mm, fs, dax: handle layout changes to pinned dax mappings")

which was removed from the nvdimm tree today and patch:

  "page cache: use xa_lock"

from the akpm tree.

I fixed it up (I removed the dax_layout_busy_page() function) and can
carry the fix as necessary. This is now fixed as far as linux-next is
concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgp5TTv784Szz.pgp
Description: OpenPGP digital signature


[PATCH v2] perf/x86/intel: move regs->flags EXACT bit init

2018-04-04 Thread Stephane Eranian
This patch removes a redundant store on regs->flags introduced
by commit:

71eb9ee9596d ("perf/x86/intel: Fix linear IP of PEBS real_ip on Haswell and 
later CPUs")

We were clearing the PERF_EFLAGS_EXACT but it was overwritten by
regs->flags = pebs->flags later on.

The PERF_EFLAGS_EXACT is a software flag using bit 3 of regs->flags.
X86 marks this bit as Reserved. To make sure this bit is zero before
we do any IP processing, we clear it explicitly.

Patch also removes the following assignment:
regs->flags = pebs->flags | (regs->flags & PERF_EFLAGS_VM);

Because there is no regs->flags to preserve anymore because
set_linear_ip() is not called until later.

In V2, we improve reability of the comments.

Signed-off-by: Stephane Eranian 
---
 arch/x86/events/intel/ds.c | 34 --
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index da6780122786..e4e97fc8eec3 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1153,7 +1153,6 @@ static void setup_pebs_sample_data(struct perf_event 
*event,
if (pebs == NULL)
return;
 
-   regs->flags &= ~PERF_EFLAGS_EXACT;
sample_type = event->attr.sample_type;
dsrc = sample_type & PERF_SAMPLE_DATA_SRC;
 
@@ -1197,7 +1196,13 @@ static void setup_pebs_sample_data(struct perf_event 
*event,
 * and PMI.
 */
*regs = *iregs;
-   regs->flags = pebs->flags;
+
+   /*
+* Initialize regs_>flags from PEBS,
+* clear exact bit (which uses x86 EFLAGS Reserved bit 3),
+* i.e., do not rely on it being zero.
+*/
+   regs->flags = pebs->flags & ~PERF_EFLAGS_EXACT;
 
if (sample_type & PERF_SAMPLE_REGS_INTR) {
regs->ax = pebs->ax;
@@ -1217,10 +1222,6 @@ static void setup_pebs_sample_data(struct perf_event 
*event,
regs->sp = pebs->sp;
}
 
-   /*
-* Preserve PERF_EFLAGS_VM from set_linear_ip().
-*/
-   regs->flags = pebs->flags | (regs->flags & PERF_EFLAGS_VM);
 #ifndef CONFIG_X86_32
regs->r8 = pebs->r8;
regs->r9 = pebs->r9;
@@ -1234,20 +1235,33 @@ static void setup_pebs_sample_data(struct perf_event 
*event,
}
 
if (event->attr.precise_ip > 1) {
-   /* Haswell and later have the eventing IP, so use it: */
+   /*
+* Haswell and later processors have eventing IP
+* which avoids the off-by-1 skid. Use it when
+* precise_ip >= 2
+*/
if (x86_pmu.intel_cap.pebs_format >= 2) {
set_linear_ip(regs, pebs->real_ip);
regs->flags |= PERF_EFLAGS_EXACT;
} else {
-   /* Otherwise use PEBS off-by-1 IP: */
+   /* Otherwise, use PEBS off-by-1 IP */
set_linear_ip(regs, pebs->ip);
 
-   /* ... and try to fix it up using the LBR entries: */
+   /*
+* With precise_ip >= 2, try to fix up the off-by-1 IP
+* using the LBR. If successful, the fixup function
+* corrects regs->ip and calls set_linear_ip() on regs.
+*/
if (intel_pmu_pebs_fixup_ip(regs))
regs->flags |= PERF_EFLAGS_EXACT;
}
-   } else
+   } else {
+   /*
+* when precise_ip == 1, return the PEBS off-by-1 IP,
+* no fixup attempted
+*/
set_linear_ip(regs, pebs->ip);
+   }
 
 
if ((sample_type & (PERF_SAMPLE_ADDR | PERF_SAMPLE_PHYS_ADDR)) &&
-- 
2.7.4



Re: [PATCH 00/15] ARM: pxa: switch to DMA slave maps

2018-04-04 Thread Ulf Hansson
On 4 April 2018 at 21:56, Boris Brezillon  wrote:
> On Wed, 04 Apr 2018 21:49:26 +0200
> Robert Jarzmik  wrote:
>
>> Ulf Hansson  writes:
>>
>> > On 2 April 2018 at 16:26, Robert Jarzmik  wrote:
>> >> Hi,
>> >>
>> >> This serie is aimed at removing the dmaengine slave compat use, and 
>> >> transfer
>> >> knowledge of the DMA requestors into architecture code.
>> >> As this looks like a patch bomb, each maintainer expressing for his tree 
>> >> either
>> >> an Ack or "I want to take through my tree" will be spared in the next 
>> >> iterations
>> >> of this serie.
>> >
>> > Perhaps an option is to send this hole series as PR for 3.17 rc1, that
>> > would removed some churns and make this faster/easier? Well, if you
>> > receive the needed acks of course.
>> For 3.17-rc1 it looks a bit optimistic with the review time ... If I have all
>
> Especially since 3.17-rc1 has been released more than 3 years ago :-),
> but I guess you meant 4.17-rc1.

Yeah, I realize that I was a bit lost in time yesterday. Even more
people have been having fun about it (me too). :-)

Kind regards
Uffe


Re: [PATCH 6/7] perf auxtrace: Make auxtrace_queues__add_buffer() allocate struct buffer

2018-04-04 Thread Adrian Hunter
On 07/03/18 16:11, Arnaldo Carvalho de Melo wrote:
> Em Wed, Mar 07, 2018 at 10:06:50AM +0200, Adrian Hunter escreveu:
>> On 06/03/18 22:25, Arnaldo Carvalho de Melo wrote:
>>> Em Tue, Mar 06, 2018 at 11:13:17AM +0200, Adrian Hunter escreveu:
 In preparation for supporting AUX area sampling buffers,
 auxtrace_queues__add_buffer() needs to be more generic. To that end, move
 memory allocation for struct buffer into it.

 Signed-off-by: Adrian Hunter 
 ---
  tools/perf/util/auxtrace.c | 54 
 +-
  1 file changed, 24 insertions(+), 30 deletions(-)

 diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
 index fb357a00dd86..e1aff91c54a8 100644
 --- a/tools/perf/util/auxtrace.c
 +++ b/tools/perf/util/auxtrace.c
 @@ -308,7 +308,11 @@ static int auxtrace_queues__add_buffer(struct 
 auxtrace_queues *queues,
   struct auxtrace_buffer *buffer,
   struct auxtrace_buffer **buffer_ptr)
  {
 -  int err;
 +  int err = -ENOMEM;
 +
 +  buffer = memdup(buffer, sizeof(*buffer));
>>>
>>> this is a bit strange, why not make buffer a local variable in this
>>> function then?
>>
>> Do you mean the following?
>>
>>  struct auxtrace_buffer *new_buf;
>>
>>  new_buf = memdup(buffer, sizeof(*buffer));
> 
> I hadn't noticed that you were using buffer as both r and l value :-\
> 
> If all you want is to receive that buffer, duplicate it and then use
> just the duplicate, not needing any reference to the original buffer,
> then your code is correct, it just looked strange from a quick look, so
> nevermind, I'll continue processing this one and the others.

Looks like this patch and patch 7 "perf auxtrace: Make
auxtrace_queues__add_buffer() do CPU filtering" got left behind.  They still
apply cleanly.


Re: [PATCH] drm/sched: Extend the documentation.

2018-04-04 Thread Daniel Vetter
On Thu, Apr 5, 2018 at 12:32 AM, Eric Anholt  wrote:
> These comments answer all the questions I had for myself when
> implementing a driver using the GPU scheduler.
>
> Signed-off-by: Eric Anholt 

Pulling all these comments into the generated kerneldoc would be
awesome, maybe as a new "GPU Scheduler" chapter at the end of
drm-mm.rst? Would mean a bit of busywork to convert the existing raw
comments into proper kerneldoc. Also has the benefit that 0day will
complain when you forget to update the comment when editing the
function prototype - kerneldoc which isn't included anywhere in .rst
won't be checked automatically.
-Daniel

> ---
>  include/drm/gpu_scheduler.h | 46 +
>  1 file changed, 42 insertions(+), 4 deletions(-)
>
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index dfd54fb94e10..c053a32341bf 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -43,10 +43,12 @@ enum drm_sched_priority {
>  };
>
>  /**
> - * A scheduler entity is a wrapper around a job queue or a group
> - * of other entities. Entities take turns emitting jobs from their
> - * job queues to corresponding hardware ring based on scheduling
> - * policy.
> + * drm_sched_entity - A wrapper around a job queue (typically attached
> + * to the DRM file_priv).
> + *
> + * Entities will emit jobs in order to their corresponding hardware
> + * ring, and the scheduler will alternate between entities based on
> + * scheduling policy.
>  */
>  struct drm_sched_entity {
> struct list_headlist;
> @@ -78,7 +80,18 @@ struct drm_sched_rq {
>
>  struct drm_sched_fence {
> struct dma_fencescheduled;
> +
> +   /* This fence is what will be signaled by the scheduler when
> +* the job is completed.
> +*
> +* When setting up an out fence for the job, you should use
> +* this, since it's available immediately upon
> +* drm_sched_job_init(), and the fence returned by the driver
> +* from run_job() won't be created until the dependencies have
> +* resolved.
> +*/
> struct dma_fencefinished;
> +
> struct dma_fence_cb cb;
> struct dma_fence*parent;
> struct drm_gpu_scheduler*sched;
> @@ -88,6 +101,13 @@ struct drm_sched_fence {
>
>  struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f);
>
> +/**
> + * drm_sched_job - A job to be run by an entity.
> + *
> + * A job is created by the driver using drm_sched_job_init(), and
> + * should call drm_sched_entity_push_job() once it wants the scheduler
> + * to schedule the job.
> + */
>  struct drm_sched_job {
> struct spsc_nodequeue_node;
> struct drm_gpu_scheduler*sched;
> @@ -112,10 +132,28 @@ static inline bool drm_sched_invalidate_job(struct 
> drm_sched_job *s_job,
>   * these functions should be implemented in driver side
>  */
>  struct drm_sched_backend_ops {
> +   /* Called when the scheduler is considering scheduling this
> +* job next, to get another struct dma_fence for this job to
> +* block on.  Once it returns NULL, run_job() may be called.
> +*/
> struct dma_fence *(*dependency)(struct drm_sched_job *sched_job,
> struct drm_sched_entity *s_entity);
> +
> +   /* Called to execute the job once all of the dependencies have
> +* been resolved.  This may be called multiple times, if
> +* timedout_job() has happened and drm_sched_job_recovery()
> +* decides to try it again.
> +*/
> struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
> +
> +   /* Called when a job has taken too long to execute, to trigger
> +* GPU recovery.
> +*/
> void (*timedout_job)(struct drm_sched_job *sched_job);
> +
> +   /* Called once the job's finished fence has been signaled and
> +* it's time to clean it up.
> +*/
> void (*free_job)(struct drm_sched_job *sched_job);
>  };
>
> --
> 2.17.0
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


Re: [RFC PATCH 1/1 v2] vmscan: Support multiple kswapd threads per node

2018-04-04 Thread Michal Hocko
On Wed 04-04-18 21:49:54, Buddy Lumpkin wrote:
> v2:
> - Make update_kswapd_threads_node less racy
> - Handle locking for case where CONFIG_MEMORY_HOTPLUG=n

Please do not repost with such a small changes. It is much more
important to sort out the big picture first and only then deal with
minor implementation details. The more versions you post the more
fragmented and messy the discussion will become.

You will have to be patient because this is a rather big change and it
will take _quite_ some time to get sorted.

Thanks!
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v2 03/19] powerpc: Mark variables as unused

2018-04-04 Thread Michael Ellerman
LEROY Christophe  writes:

> Mathieu Malaterre  a écrit :
>
>> Add gcc attribute unused for two variables. Fix warnings treated as errors
>> with W=1:
>>
>>   arch/powerpc/kernel/prom_init.c:1388:8: error: variable ‘path’ set  
>> but not used [-Werror=unused-but-set-variable]
>>
>> Suggested-by: Christophe Leroy 
>> Signed-off-by: Mathieu Malaterre 
>> ---
>> v2: move path within ifdef DEBUG_PROM
>>
>>  arch/powerpc/kernel/prom_init.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/prom_init.c  
>> b/arch/powerpc/kernel/prom_init.c
>> index acf4b2e0530c..4163b11abb6c 100644
>> --- a/arch/powerpc/kernel/prom_init.c
>> +++ b/arch/powerpc/kernel/prom_init.c
>> @@ -603,7 +603,7 @@ static void __init early_cmdline_parse(void)
>>  const char *opt;
>>
>>  char *p;
>> -int l = 0;
>> +int l __maybe_unused = 0;
>>
>>  prom_cmd_line[0] = 0;
>>  p = prom_cmd_line;
>> @@ -1385,7 +1385,7 @@ static void __init reserve_mem(u64 base, u64 size)
>>  static void __init prom_init_mem(void)
>>  {
>>  phandle node;
>> -char *path, type[64];
>> +char *path __maybe_unused, type[64];
>
> You should enclose that in an ifdef DEBUG_PROM instead of hiding the warning

I disagree, the result is horrible:

 static void __init prom_init_mem(void)
 {
phandle node;
-   char *path, type[64];
+#ifdef DEBUG_PROM
+   char *path;
+#endif
+   char type[64];
unsigned int plen;
cell_t *p, *endp;
__be32 val;


The right fix is to move the debug logic into a helper, and put the path
in there, eg. something like (not tested):

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index f9d6befb55a6..b02fa2ccc70b 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -1389,6 +1389,18 @@ static void __init reserve_mem(u64 base, u64 size)
mem_reserve_cnt = cnt + 1;
 }
 
+#ifdef DEBUG_PROM
+static void prom_debug_path(phandle node)
+{
+   char *path;
+   path = prom_scratch;
+   memset(path, 0, PROM_SCRATCH_SIZE);
+   call_prom("package-to-path", 3, 1, node, path, PROM_SCRATCH_SIZE-1);
+   prom_debug("  node %s :\n", path);
+}
+#else
+static void prom_debug_path(phandle node) { }
+#endif /* DEBUG_PROM */
 /*
  * Initialize memory allocation mechanism, parse "memory" nodes and
  * obtain that way the top of memory and RMO to setup out local allocator
@@ -1441,11 +1453,7 @@ static void __init prom_init_mem(void)
p = regbuf;
endp = p + (plen / sizeof(cell_t));
 
-#ifdef DEBUG_PROM
-   memset(path, 0, PROM_SCRATCH_SIZE);
-   call_prom("package-to-path", 3, 1, node, path, 
PROM_SCRATCH_SIZE-1);
-   prom_debug("  node %s :\n", path);
-#endif /* DEBUG_PROM */
+   prom_debug_path(node);
 
while ((endp - p) >= (rac + rsc)) {
unsigned long base, size;


Although that also begs the question of why the hell do we need path at
all, and not just use prom_scratch directly?

cheers


Re: [PATCH] kvm: Add emulation for movups/movupd

2018-04-04 Thread Paolo Bonzini
On 04/04/2018 19:35, Stefan Fritsch wrote:
> On Wednesday, 4 April 2018 19:24:20 CEST Paolo Bonzini wrote:
>> On 04/04/2018 19:10, Konrad Rzeszutek Wilk wrote:
>>> Should there be a corresponding test-case?
>>
>> Good point!  Stefan, could you write one?
> 
> Is there infrastructure for such tests? If yes, can you give me a pointer to 
> it?

Yes, check out x86/emulator.c in
https://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git.  There is
already a movaps test.

Paolo


Re: [PATCH 2/2] efi: Add embedded peripheral firmware support

2018-04-04 Thread Lukas Wunner
On Wed, Apr 04, 2018 at 01:18:36PM -0400, Peter Jones wrote:
> > On Tue, Apr 03, 2018 at 08:07:11PM +0200, Lukas Wunner wrote:
> > > * Add the EFI Firmware Volume Protocol to include/linux/efi.h:
> > >   
> > > https://www.intel.com/content/dam/doc/reference-guide/efi-firmware-file-volume-specification.pdf
> > > 
> > > * Amend arch/x86/boot/compressed/eboot.c to read the files with the
> > >   GUIDs you're interested in into memory and pass the files to the
> > >   kernel as setup_data payloads.
> 
> To be honest, I'm a bit skeptical about the firmware volume approach.
> Tools like UEFITool[0] and uefi-firmware-parser[1] have existed for
> years, still don't seem to reliably parse firmware images I see in the
> wild, and have a fairly regular need for fixes.  These are tools
> maintained by smart people who are making a real effort, and it still
> looks pretty hard to do a good job that applies across a lot of
> platforms.
> 
> So I'd rather use Hans's existing patches, at least for now, and if
> someone is interested in hacking on making an efi firmware volume parser
> for the kernel, switch them to that when such a thing is ready.

Hello?  As I've written in the above-quoted e-mail the kernel should
read the files using EFI_FIRMWARE_VOLUME_PROTOCOL.ReadFile().

*Not* by parsing the firmware volume!

Parsing the firmware volume is only necessary to find out the GUIDs
of the files you're looking for.  You only do that *once*.

I habitually extract Apple's EFI Firmware Volumes and found the
existing tools always did a sufficiently good job to get the
information I need (such as UEFIExtract, which I've linked to,
and which is part of the UEFITool suite, which you've linked to
once more).


On Wed, Apr 04, 2018 at 10:25:05PM +0200, Hans de Goede wrote:
> Lukas, thank you for your suggestions on this, but I doubt that these
> devices use the Firmware Volume stuff.

If they're using EFI, they're using a Firmware Volume and you should
be able to use the Firmware Volume Protocol to read files from it.

If that protocol wasn't supported, the existing EFI drivers wouldn't
be able to function as they couldn't load files from the volume.


> What you are describing sounds like significantly more work then
> the vendor just embedding the firmware as a char firmware[] in their
> EFI mouse driver.

In such cases, read the EFI mouse driver from the firmware volume and
extract the firmware from the offset you've pre-determined.  That's
never going to change since the devices never receive updates, as
you're saying.  So basically you'll have a struct with GUID, offset,
length, and the name under which the firmware is made available to
Linux drivers.


> That combined with Peter's worries about difficulties parsing the
> Firmware Volume stuff, makes me believe that it is best to just
> stick with my current approach as Peter suggests.

I don't know why you insist on a hackish solution (your own words)
even though a cleaner solution is suggested, but fortunately it's
not for me to decide what goes in and what doesn't. ;-)

Thanks,

Lukas


Re: [PATCH] prctl: Deprecate non PR_SET_MM_MAP operations

2018-04-04 Thread Michal Hocko
On Thu 05-04-18 00:29:06, Cyrill Gorcunov wrote:
> On Wed, Apr 04, 2018 at 01:53:08PM -0700, Randy Dunlap wrote:
[...]
> > Yeah, that one's wrong also.  :)
> 
> So, maybe just get rid of any warning message at all?

or simply do
pr_warn("PR_SET_MM_MAP has been removed. Use  instead\n");
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 1/3] kbuild: Support HOSTLDFLAGS

2018-04-04 Thread Masahiro Yamada
2018-03-29 9:48 GMT+09:00 Laura Abbott :
>
> In addition to HOSTCFLAGS, there's HOSTLDFLAGS. Ensure these get passed to
> calls to build host binaries.
>
> Signed-off-by: Laura Abbott 
> ---
>  scripts/Makefile.host  | 6 +++---
>  tools/build/Makefile.build | 2 +-
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/scripts/Makefile.host b/scripts/Makefile.host
> index e6dc6ae2d7c4..a3a0e2282a56 100644
> --- a/scripts/Makefile.host
> +++ b/scripts/Makefile.host
> @@ -84,7 +84,7 @@ hostcxx_flags  = -Wp,-MD,$(depfile) $(__hostcxx_flags)
>  # Create executable from a single .c file
>  # host-csingle -> Executable
>  quiet_cmd_host-csingle = HOSTCC  $@
> -  cmd_host-csingle = $(HOSTCC) $(hostc_flags) -o $@ $< \
> +  cmd_host-csingle = $(HOSTCC) $(HOSTLDFLAGS) $(hostc_flags) -o $@ $< \
> $(HOST_LOADLIBES) $(HOSTLOADLIBES_$(@F))
>  $(host-csingle): $(obj)/%: $(src)/%.c FORCE
> $(call if_changed_dep,host-csingle)


This hunk is correct, but Robin posted this.
https://patchwork.kernel.org/patch/10243073/

Somehow it fell into a crack, and I missed to pick it up.
I will apply it lately.




> @@ -102,7 +102,7 @@ $(call multi_depend, $(host-cmulti), , -objs)
>  # Create .o file from a single .c file
>  # host-cobjs -> .o
>  quiet_cmd_host-cobjs   = HOSTCC  $@
> -  cmd_host-cobjs   = $(HOSTCC) $(hostc_flags) -c -o $@ $<
> +  cmd_host-cobjs   = $(HOSTCC) $(HOSTLDFLAGS) $(hostc_flags) -c -o $@ $<
>  $(host-cobjs): $(obj)/%.o: $(src)/%.c FORCE
> $(call if_changed_dep,host-cobjs)
>

This does not involve the link stage.
You should not do this.


> @@ -126,7 +126,7 @@ $(host-cxxobjs): $(obj)/%.o: $(src)/%.cc FORCE
>  # Compile .c file, create position independent .o file
>  # host-cshobjs -> .o
>  quiet_cmd_host-cshobjs = HOSTCC  -fPIC $@
> -  cmd_host-cshobjs = $(HOSTCC) $(hostc_flags) -fPIC -c -o $@ $<
> +  cmd_host-cshobjs = $(HOSTCC) $(HOSTLDFLAGS) $(hostc_flags) -fPIC -c -o 
> $@ $<
>  $(host-cshobjs): $(obj)/%.o: $(src)/%.c FORCE
> $(call if_changed_dep,host-cshobjs)
>
> diff --git a/tools/build/Makefile.build b/tools/build/Makefile.build
> index cd72016c3cfa..cab55f0d90e1 100644
> --- a/tools/build/Makefile.build
> +++ b/tools/build/Makefile.build
> @@ -64,7 +64,7 @@ quiet_cmd_cc_o_c = CC   $@
>cmd_cc_o_c = $(CC) $(c_flags) -c -o $@ $<
>
>  quiet_cmd_host_cc_o_c = HOSTCC   $@
> -  cmd_host_cc_o_c = $(HOSTCC) $(host_c_flags) -c -o $@ $<
> +  cmd_host_cc_o_c = $(HOSTCC) $(HOSTLDFLAGS) $(host_c_flags) -c -o $@ $<
>
>  quiet_cmd_cxx_o_c = CXX  $@
>cmd_cxx_o_c = $(CXX) $(cxx_flags) -c -o $@ $<



-- 
Best Regards
Masahiro Yamada


Re: [PATCH v2 2/2] perf: riscv: Add Document for Future Porting Guide

2018-04-04 Thread Alan Kao
Hi Alex,

On Tue, Apr 03, 2018 at 07:08:43PM -0700, Alex Solomatnikov wrote:
> Doc fixes:
> 
>

Thanks for these fixes.  I'll edit this patch and send a v3 once I am done
with the PMU patch.

I suppose a "Reviewed-by: Alex Solomatnikov" appending at the end of the
commit will be great, right?

Alan

> diff --git a/Documentation/riscv/pmu.txt b/Documentation/riscv/pmu.txt
> index a3e930e..ae90a5e 100644
> --- a/Documentation/riscv/pmu.txt
> +++ b/Documentation/riscv/pmu.txt
> @@ -20,7 +20,7 @@ the lack of the following general architectural
> performance monitoring features:
>  * Enabling/Disabling counters
>Counters are just free-running all the time in our case.
>  * Interrupt caused by counter overflow
> -  No such design in the spec.
> +  No such feature in the spec.
>  * Interrupt indicator
>It is not possible to have many interrupt ports for all counters, so an
>interrupt indicator is required for software to tell which counter has
> @@ -159,14 +159,14 @@ interrupt for perf, so the details are to be
> completed in the future.
> 
>  They seem symmetric but perf treats them quite differently.  For reading, 
> there
>  is a *read* interface in *struct pmu*, but it serves more than just reading.
> -According to the context, the *read* function not only read the content of 
> the
> -counter (event->count), but also update the left period to the next interrupt
> +According to the context, the *read* function not only reads the content of 
> the
> +counter (event->count), but also updates the left period for the next 
> interrupt
>  (event->hw.period_left).
> 
>  But the core of perf does not need direct write to counters.  Writing 
> counters
> -hides behind the abstraction of 1) *pmu->start*, literally start
> counting so one
> +is hidden behind the abstraction of 1) *pmu->start*, literally start
> counting so one
>  has to set the counter to a good value for the next interrupt; 2)
> inside the IRQ
> -it should set the counter with the same reason.
> +it should set the counter to the same reasonable value.
> 
>  Reading is not a problem in RISC-V but writing would need some effort, since
>  counters are not allowed to be written by S-mode.
> @@ -190,37 +190,37 @@ Three states (event->hw.state) are defined:
>  A normal flow of these state transitions are as follows:
> 
>  * A user launches a perf event, resulting in calling to *event_init*.
> -* When being context-switched in, *add* is called by the perf core, with flag
> -  PERF_EF_START, which mean that the event should be started after it is 
> added.
> -  In this stage, an general event is binded to a physical counter, if any.
> +* When being context-switched in, *add* is called by the perf core, with a 
> flag
> +  PERF_EF_START, which means that the event should be started after
> it is added.
> +  At this stage, a general event is bound to a physical counter, if any.
>The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE,
> because it is now
>stopped, and the (software) event count does not need updating.
>  ** *start* is then called, and the counter is enabled.
> -   With flag PERF_EF_RELOAD, it write the counter to an appropriate
> value (check
> -   previous section for detail).
> -   No writing is made if the flag does not contain PERF_EF_RELOAD.
> -   The state now is reset to none, because it is neither stopped nor update
> -   (the counting already starts)
> -* When being context-switched out, *del* is called.  It then checkout all the
> -  events in the PMU and call *stop* to update their counts.
> +   With flag PERF_EF_RELOAD, it writes an appropriate value to the
> counter (check
> +   the previous section for details).
> +   Nothing is written if the flag does not contain PERF_EF_RELOAD.
> +   The state now is reset to none, because it is neither stopped nor updated
> +   (the counting already started)
> +* When being context-switched out, *del* is called.  It then checks out all 
> the
> +  events in the PMU and calls *stop* to update their counts.
>  ** *stop* is called by *del*
> and the perf core with flag PERF_EF_UPDATE, and it often shares the same
> subroutine as *read* with the same logic.
> The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, again.
> 
> -** Life cycles of these two pairs: *add* and *del* are called repeatedly as
> +** Life cycle of these two pairs: *add* and *del* are called repeatedly as
>tasks switch in-and-out; *start* and *stop* is also called when the perf 
> core
>needs a quick stop-and-start, for instance, when the interrupt
> period is being
>adjusted.
> 
> -Current implementation is sufficient for now and can be easily extend to
> +Current implementation is sufficient for now and can be easily
> extended with new
>  features in the future.
> 
>  A. Related Structures
>  -
> 
> -* struct pmu: include/linux/perf_events.h
> -* struct riscv_pmu: arch/riscv/include/asm/perf_events.h
> +* struct pmu: include/linux/perf_event.h
> +* s

Re: [PATCH 0/2] perf: riscv: Preliminary Perf Event Support on RISC-V

2018-04-04 Thread Alan Kao
On Tue, Apr 03, 2018 at 03:45:17PM -0700, Palmer Dabbelt wrote:
> On Tue, 03 Apr 2018 07:29:02 PDT (-0700), alan...@andestech.com wrote:
> >On Mon, Apr 02, 2018 at 08:15:44PM -0700, Palmer Dabbelt wrote:
> >>On Mon, 02 Apr 2018 05:31:22 PDT (-0700), alan...@andestech.com wrote:
> >>>This implements the baseline PMU for RISC-V platforms.
> >>>
> >>>To ease future PMU portings, a guide is also written, containing
> >>>perf concepts, arch porting practices and some hints.
> >>>
> >>>Changes in v2:
> >>> - Fix the bug reported by Alex, which was caused by not sufficient
> >>>   initialization.  Check https://lkml.org/lkml/2018/3/31/251 for the
> >>>   discussion.
> >>>
> >>>Alan Kao (2):
> >>>  perf: riscv: preliminary RISC-V support
> >>>  perf: riscv: Add Document for Future Porting Guide
> >>>
> >>> Documentation/riscv/pmu.txt | 249 +++
> >>> arch/riscv/Kconfig  |  12 +
> >>> arch/riscv/include/asm/perf_event.h |  76 +-
> >>> arch/riscv/kernel/Makefile  |   1 +
> >>> arch/riscv/kernel/perf_event.c  | 468 
> >>> 
> >>> 5 files changed, 802 insertions(+), 4 deletions(-)
> >>> create mode 100644 Documentation/riscv/pmu.txt
> >>> create mode 100644 arch/riscv/kernel/perf_event.c
> >>
> >>I'm having some trouble pulling this into my tree.  I think you might have
> >>another patch floating around somewhere, as I don't have any
> >>arch/riscv/include/asm/perf_event.h right now.
> >>
> >>Do you mind rebasing this on top of linux-4.16 so I can look properly?
> >>
> >>Thanks!
> >
> >Sorry for the inconvenience, but this patch was based on Alex's patch at
> >https://github.com/riscv/riscv-linux/pull/124/files.  I thought that one
> >had already been picked into your tree.
> >
> >Any ideas?
> 
> Thanks, it applies on top of that.  I'm going to play around with this a
> bit, but it looks generally good.

Note that to make it work better when wraparound occurs, you should change the
value of *.counter_width* into the width of real hardware counters.  This is
because this patch does not handle wraparound checking, so using a wider
bit mask may sometimes report a extremely large number.

Ideally this should be done by adding a Kconfig option called 
"Hifive Unleashed PMU" which automatically sets the width an reuses most of the
baseline codes.  What do you think about this?

Thanks.


Re: [PATCH 3/3] kbuild: Allow passing additional HOSTCFLAGS and HOSTLDFLAGS

2018-04-04 Thread Masahiro Yamada
2018-03-29 9:48 GMT+09:00 Laura Abbott :
>
> Similar to AFLAGS_KBUILD, there may be uses (e.g. hardening) for passing in
> additional flags to host programs. Allow these to be passed in from the
> environment.
>
> Signed-off-by: Laura Abbott 
> ---
>  Documentation/kbuild/kbuild.txt | 9 +
>  Makefile| 3 +++
>  2 files changed, 12 insertions(+)
>
> diff --git a/Documentation/kbuild/kbuild.txt b/Documentation/kbuild/kbuild.txt
> index ac2363ea05c5..3751a4bc8596 100644
> --- a/Documentation/kbuild/kbuild.txt
> +++ b/Documentation/kbuild/kbuild.txt
> @@ -24,6 +24,15 @@ KAFLAGS
>  --
>  Additional options to the assembler (for built-in and modules).
>
> +AFLAGS_HOSTCFLAGS
> +--
> +Additional options passed to the compiler when building host programs.
> +
> +AFLAGS_HOSTLDFLAGS
> +--
> +Additional options passed to the linker (through the compiler) when buidling
> +host programs.
> +


I am afraid you misunderstood the meaning of 'AFLAGS'.

AFLAGS is not 'Additional flags', but 'Assembler flags'
that are used for compiling *.S files.

AFLAGS for host programs is weird.



I see similar proposals from different people.

I replied like follows:
https://lkml.org/lkml/2018/2/28/178

However, Robin seems busy lately.

I will wait a bit, then
if nobody does this, I may do it.




>  AFLAGS_MODULE
>  --
>  Additional module specific options to use for $(AS).
> diff --git a/Makefile b/Makefile
> index 7ba478ab8c82..2cab3f8d489c 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -367,6 +367,9 @@ HOSTCXXFLAGS := -O2 $(HOST_LFS_CFLAGS)
>  HOSTLDFLAGS  := $(HOST_LFS_LDFLAGS)
>  HOST_LOADLIBES := $(HOST_LFS_LIBS)
>
> +HOSTCFLAGS  += $(AFLAGS_HOSTCFLAGS)
> +HOSTLDFLAGS  += $(AFLAGS_HOSTLDFLAGS)
> +
>  # Make variables (CC, etc...)
>  AS = $(CROSS_COMPILE)as
>  LD = $(CROSS_COMPILE)ld
> --
> 2.16.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards
Masahiro Yamada


Re: [PATCH V2 6/7] thermal/drivers/cpu_cooling: Introduce the cpu idle cooling driver

2018-04-04 Thread Viresh Kumar
On 04-04-18, 10:50, Daniel Lezcano wrote:
> Mmh, that sounds very complex. May be it is simpler to count the number
> of cluster and initialize the idle_cdev for each cluster and then go for
> this loop with the cluster cpumask.

Maybe not sure. I have had such code in the past and it was quite
straight forward to understand :)

You can go with whichever version you like.

-- 
viresh


Re: [PATCH v15 0/9] Add io{read|write}64 to io-64-atomic headers

2018-04-04 Thread Michael Ellerman
Logan Gunthorpe  writes:
> On 4/4/2018 4:38 AM, Michael Ellerman wrote:
...
>> eg. It looks like I could take the two powerpc patches on their own for
>> 4.17, and then the rest could go via other trees?
>
> Yup! If you can take the powerpc patches I can keep trying to get the 
> rest in. They are largely independent and shouldn't really change 
> anything without the following patches.

OK, I'll take those then.

>> Is patch 1 stand alone?
>
> Essentially, yes, but patch 5 depends on it seeing it's changing the 
> same area and is trying to avoid creating the same kbuild warnings that 
> patch 1 suppresses.
>
>> The other option is to ask Andrew Morton to take it, as he often carries
>> these cross-tree type series.
>
> Thanks for the tip! I'll copy him when I repost it after the merge window.

Cool. If you want him to take it you should probably explicitly say so
when you repost.

cheers


Re: [PATCH 2/3] objtool: Support HOSTCFLAGS and HOSTLDFLAGS

2018-04-04 Thread Masahiro Yamada
2018-03-29 9:48 GMT+09:00 Laura Abbott :
> It may be useful to compile host programs with different flags (e.g.
> hardening). Ensure that objtool picks up the appropriate flags.
>
> Signed-off-by: Laura Abbott 
> ---


I saw some similar patches before.

I thought they are fixing this way
https://patchwork.kernel.org/patch/10251259/

but, I am not tacking the progress.




>  tools/objtool/Makefile | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/tools/objtool/Makefile b/tools/objtool/Makefile
> index e6acc281dd37..0ff3bcac1ca9 100644
> --- a/tools/objtool/Makefile
> +++ b/tools/objtool/Makefile
> @@ -31,8 +31,9 @@ INCLUDES := -I$(srctree)/tools/include \
> -I$(srctree)/tools/arch/$(HOSTARCH)/include/uapi \
> -I$(srctree)/tools/objtool/arch/$(ARCH)/include
>  WARNINGS := $(EXTRA_WARNINGS) -Wno-switch-default -Wno-switch-enum 
> -Wno-packed
> -CFLAGS   += -Wall -Werror $(WARNINGS) -fomit-frame-pointer -O2 -g $(INCLUDES)
> -LDFLAGS  += -lelf $(LIBSUBCMD)
> +CFLAGS   += -Wall -Werror $(WARNINGS) $(HOSTCFLAGS) -fomit-frame-pointer -O2 
> -g \
> +   $(INCLUDES)
> +LDFLAGS  += -lelf $(LIBSUBCMD) $(HOSTLDFLAGS)
>
>  # Allow old libelf to be used:
>  elfshdr := $(shell echo '\#include ' | $(CC) $(CFLAGS) -x c -E - | 
> grep elf_getshdr)
> --
> 2.16.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards
Masahiro Yamada


[PATCH] uapi: fix linux/kfd_ioctl.h userspace compilation errors

2018-04-04 Thread Dmitry V. Levin
Consistently use types provided by  via 
to fix the following linux/kfd_ioctl.h userspace compilation errors:

/usr/include/linux/kfd_ioctl.h:266:2: error: unknown type name 'uint64_t'
  uint64_t tba_addr;  /* to KFD */
/usr/include/linux/kfd_ioctl.h:267:2: error: unknown type name 'uint64_t'
  uint64_t tma_addr;  /* to KFD */
/usr/include/linux/kfd_ioctl.h:268:2: error: unknown type name 'uint32_t'
  uint32_t gpu_id;  /* to KFD */
/usr/include/linux/kfd_ioctl.h:269:2: error: unknown type name 'uint32_t'
  uint32_t pad;

Fixes: d7b9bd2248d79 ("drm/amdkfd: Add support for user-mode trap handlers")
Cc:  # v4.16
Signed-off-by: Dmitry V. Levin 
---
 include/uapi/linux/kfd_ioctl.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index f4cab5b3ba9a..111d73ba2d96 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -263,10 +263,10 @@ struct kfd_ioctl_get_tile_config_args {
 };
 
 struct kfd_ioctl_set_trap_handler_args {
-   uint64_t tba_addr;  /* to KFD */
-   uint64_t tma_addr;  /* to KFD */
-   uint32_t gpu_id;/* to KFD */
-   uint32_t pad;
+   __u64 tba_addr; /* to KFD */
+   __u64 tma_addr; /* to KFD */
+   __u32 gpu_id;   /* to KFD */
+   __u32 pad;
 };
 
 #define AMDKFD_IOCTL_BASE 'K'
-- 
ldv


[PATCH] uapi: fix asm/bootparam.h userspace compilation errors

2018-04-04 Thread Dmitry V. Levin
Consistently use types provided by  to fix the following
asm/bootparam.h userspace compilation errors:

/usr/include/asm/bootparam.h:140:2: error: unknown type name 'u16'
  u16 version;
/usr/include/asm/bootparam.h:141:2: error: unknown type name 'u16'
  u16 compatible_version;
/usr/include/asm/bootparam.h:142:2: error: unknown type name 'u16'
  u16 pm_timer_address;
/usr/include/asm/bootparam.h:143:2: error: unknown type name 'u16'
  u16 num_cpus;
/usr/include/asm/bootparam.h:144:2: error: unknown type name 'u64'
  u64 pci_mmconfig_base;
/usr/include/asm/bootparam.h:145:2: error: unknown type name 'u32'
  u32 tsc_khz;
/usr/include/asm/bootparam.h:146:2: error: unknown type name 'u32'
  u32 apic_khz;
/usr/include/asm/bootparam.h:147:2: error: unknown type name 'u8'
  u8 standard_ioapic;
/usr/include/asm/bootparam.h:148:2: error: unknown type name 'u8'
  u8 cpu_ids[255];

Fixes: 4a362601baa6 ("x86/jailhouse: Add infrastructure for running in non-root 
cell")
Cc:  # v4.16
Signed-off-by: Dmitry V. Levin 
---
 arch/x86/include/uapi/asm/bootparam.h | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/uapi/asm/bootparam.h 
b/arch/x86/include/uapi/asm/bootparam.h
index aebf60357758..a06cbf019744 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -137,15 +137,15 @@ struct boot_e820_entry {
  * setup data structure.
  */
 struct jailhouse_setup_data {
-   u16 version;
-   u16 compatible_version;
-   u16 pm_timer_address;
-   u16 num_cpus;
-   u64 pci_mmconfig_base;
-   u32 tsc_khz;
-   u32 apic_khz;
-   u8  standard_ioapic;
-   u8  cpu_ids[255];
+   __u16   version;
+   __u16   compatible_version;
+   __u16   pm_timer_address;
+   __u16   num_cpus;
+   __u64   pci_mmconfig_base;
+   __u32   tsc_khz;
+   __u32   apic_khz;
+   __u8standard_ioapic;
+   __u8cpu_ids[255];
 } __attribute__((packed));
 
 /* The so-called "zeropage" */
-- 
ldv


Re: [PATCH v3 2/4] bus: fsl-mc: add restool userspace support

2018-04-04 Thread Stuart Yoder
On Wed, Apr 4, 2018 at 7:42 AM, Andrew Lunn  wrote:
>> I hear you.  It is more complicated this way...having all these individual
>> objects vs just a single "bundle" of them that represents a NIC.  But, that's
>> the way the DPAA2 hardware is, and we're implementing kernel support for
>> the hardware as it is.
>
> Hi Stuart
>
> I see we are not making any progress here.
>
> So what i suggest is you post the kernel code and configuration tool
> concept to netdev for a full review. You want reviews from David
> Miller, Jiri Pirko, Jakub Kicinski, David Ahern, etc.

I know Ioana has other feedback she is addressing so a respin will be coming
soon, and she can include those additional reviewers.

Thanks,
Stuart


Re: [RFC PATCH 1/1] vmscan: Support multiple kswapd threads per node

2018-04-04 Thread Buddy Lumpkin

> On Apr 3, 2018, at 2:12 PM, Matthew Wilcox  wrote:
> 
> On Tue, Apr 03, 2018 at 01:49:25PM -0700, Buddy Lumpkin wrote:
>>> Yes, very much this.  If you have a single-threaded workload which is
>>> using the entirety of memory and would like to use even more, then it
>>> makes sense to use as many CPUs as necessary getting memory out of its
>>> way.  If you have N CPUs and N-1 threads happily occupying themselves in
>>> their own reasonably-sized working sets with one monster process trying
>>> to use as much RAM as possible, then I'd be pretty unimpressed to see
>>> the N-1 well-behaved threads preempted by kswapd.
>> 
>> The default value provides one kswapd thread per NUMA node, the same
>> it was without the patch. Also, I would point out that just because you 
>> devote
>> more threads to kswapd, doesn’t mean they are busy. If multiple kswapd 
>> threads
>> are busy, they are almost certainly doing work that would have resulted in
>> direct reclaims, which are often substantially more expensive than a couple
>> extra context switches due to preemption.
> 
> [...]
> 
>> In my previous response to Michal Hocko, I described
>> how I think we could scale watermarks in response to direct reclaims, and
>> launch more kswapd threads when kswapd peaks at 100% CPU usage.
> 
> I think you're missing my point about the workload ... kswapd isn't
> "nice", so it will compete with the N-1 threads which are chugging along
> at 100% CPU inside their working sets.  In this scenario, we _don't_
> want to kick off kswapd at all; we want the monster thread to clean up
> its own mess.  If we have idle CPUs, then yes, absolutely, lets have
> them clean up for the monster, but otherwise, I want my N-1 threads
> doing their own thing.
> 
> Maybe we should renice kswapd anyway ... thoughts?  We don't seem to have
> had a nice'd kswapd since 2.6.12, but maybe we played with that earlier
> and discovered it was a bad idea?
> 


Trying to distinguish between the monster and a high value task that you want
to run as quickly as possible would be challenging. I like your idea of using
renice. It probably makes sense to continue to run the first thread on each node
at a standard nice value, and run each additional task with a positive nice 
value.








Re: [PATCH v1] kernel/trace:check the val against the available mem

2018-04-04 Thread Joel Fernandes
On Wed, Apr 4, 2018 at 7:58 PM, Matthew Wilcox  wrote:
> On Wed, Apr 04, 2018 at 11:47:30AM -0400, Steven Rostedt wrote:
>> I originally was going to remove the RETRY_MAYFAIL, but adding this
>> check (at the end of the loop though) appears to have OOM consistently
>> kill this task.
>>
>> I still like to keep RETRY_MAYFAIL, because it wont trigger OOM if
>> nothing comes in and tries to do an allocation, but instead will fail
>> nicely with -ENOMEM.
>
> I still don't get why you want RETRY_MAYFAIL.  You know that tries
> *harder* to allocate memory than plain GFP_KERNEL does, right?  And
> that seems like the exact opposite of what you want.

No. We do want it to try harder but not if its already setup for failure.


KASAN: use-after-free Read in ntfs_read_locked_inode

2018-04-04 Thread syzbot

Hello,

syzbot hit the following crash on upstream commit
3e968c9f1401088abc9a19ae6ff571644d37a355 (Wed Apr 4 21:19:24 2018 +)
Merge tag 'ext4_for_linus' of  
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=19b469021157c136116a


C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5682455868604416
syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=5679121900240896
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5751565918928896
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=9118669095563550941

compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+19b469021157c1361...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

ntfs: volume version 3.1.
ntfs: volume version 3.1.
ntfs: volume version 3.1.
ntfs: volume version 3.1.
==
BUG: KASAN: use-after-free in ntfs_read_locked_inode+0x47fe/0x51b0  
fs/ntfs/inode.c:670

Read of size 8 at addr 8801becc42e8 by task syzkaller675411/4496

CPU: 0 PID: 4496 Comm: syzkaller675411 Not tainted 4.16.0+ #15
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x1a7/0x27d lib/dump_stack.c:53
 print_address_description+0x73/0x250 mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report+0x23c/0x360 mm/kasan/report.c:412
 __asan_report_load_n_noabort+0xf/0x20 mm/kasan/report.c:443
 ntfs_read_locked_inode+0x47fe/0x51b0 fs/ntfs/inode.c:670
 ntfs_iget+0x1ab/0x240 fs/ntfs/inode.c:190
 load_and_init_quota fs/ntfs/super.c:1406 [inline]
 load_system_files+0x5f06/0x6c80 fs/ntfs/super.c:2117
 ntfs_fill_super+0x1485/0x2fb0 fs/ntfs/super.c:2908
 mount_bdev+0x2b7/0x370 fs/super.c:1119
 ntfs_mount+0x34/0x40 fs/ntfs/super.c:3065
 mount_fs+0x66/0x2d0 fs/super.c:1222
 vfs_kern_mount.part.26+0xc6/0x4a0 fs/namespace.c:1037
 vfs_kern_mount fs/namespace.c:2514 [inline]
 do_new_mount fs/namespace.c:2517 [inline]
 do_mount+0xea4/0x2b90 fs/namespace.c:2847
 ksys_mount+0xab/0x120 fs/namespace.c:3063
 SYSC_mount fs/namespace.c:3077 [inline]
 SyS_mount+0x39/0x50 fs/namespace.c:3074
 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x44597a
RSP: 002b:7ffe9dfbd7d8 EFLAGS: 0206 ORIG_RAX: 00a5
RAX: ffda RBX: 2a80 RCX: 0044597a
RDX: 2000 RSI: 2100 RDI: 7ffe9dfbd850
RBP: 0003 R08: 20077a00 R09: 000a
R10:  R11: 0206 R12: 0004
R13: ab33 R14:  R15: 

The buggy address belongs to the page:
page:ea0006fb3100 count:0 mapcount:0 mapping: index:0x1
flags: 0x2fffc00()
raw: 02fffc00  0001 
raw: dead0100 dead0200  
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 8801becc4180: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 8801becc4200: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

8801becc4280: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

  ^
 8801becc4300: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 8801becc4380: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkal...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged

into any tree, please reply to this email with:
#syz fix: exact-commit-title
If you want to test a patch for this bug, please reply with:
#syz test: git://repo/address.git branch
and provide the patch inline or as an attachment.
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.

Note: all commands must start from beginning of the line in the email body.


Re: [PATCH] f2fs: enlarge block plug coverage

2018-04-04 Thread Jaegeuk Kim
On 04/04, Chao Yu wrote:
> This patch enlarges block plug coverage in __issue_discard_cmd, in
> order to collect more pending bios before issuing them, to avoid
> being disturbed by previous discard I/O in IO aware discard mode.

Hmm, then we need to wait for huge discard IO for over 10 secs, which
will affect following read/write IOs accordingly. In order to avoid that,
we actually need to limit the discard size.

Thanks,

> 
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/segment.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 8f0b5ba46315..4287e208c040 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -1208,10 +1208,12 @@ static int __issue_discard_cmd(struct f2fs_sb_info 
> *sbi,
>   pend_list = &dcc->pend_list[i];
>  
>   mutex_lock(&dcc->cmd_lock);
> +
> + blk_start_plug(&plug);
> +
>   if (list_empty(pend_list))
>   goto next;
>   f2fs_bug_on(sbi, !__check_rb_tree_consistence(sbi, &dcc->root));
> - blk_start_plug(&plug);
>   list_for_each_entry_safe(dc, tmp, pend_list, list) {
>   f2fs_bug_on(sbi, dc->state != D_PREP);
>  
> @@ -1227,8 +1229,9 @@ static int __issue_discard_cmd(struct f2fs_sb_info *sbi,
>   if (++iter >= dpolicy->max_requests)
>   break;
>   }
> - blk_finish_plug(&plug);
>  next:
> + blk_finish_plug(&plug);
> +
>   mutex_unlock(&dcc->cmd_lock);
>  
>   if (iter >= dpolicy->max_requests)
> -- 
> 2.15.0.55.gc2ece9dc4de6


Re: [PATCH 3/4] tracing: Add action comparisons when testing matching hist triggers

2018-04-04 Thread Masami Hiramatsu
On Wed, 04 Apr 2018 10:17:03 -0500
Tom Zanussi  wrote:

> Hi Masami,
> 
> On Wed, 2018-04-04 at 21:33 +0900, Masami Hiramatsu wrote:
> > Hi Tom,
> > 
> > On Mon, 02 Apr 2018 12:09:33 -0500
> > Tom Zanussi  wrote:
> > 
> > > after:
> > > 
> > >   # echo 'wakeup_latency u64 lat; pid_t pid' >> 
> > > /sys/kernel/debug/tracing/synthetic_events
> > >   # echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="cyclictest"' 
> > > >> /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
> > >   # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0 if 
> > > next_comm=="cyclictest"' >> 
> > > /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
> > >   # echo 
> > > 'hist:keys=next_pid:onmatch(sched.sched_wakeup).wakeup_latency(sched.sched_switch.$wakeup_lat,next_pid)
> > >  if next_comm=="cyclictest"' >> 
> > > /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
> > >   # echo 
> > > 'hist:keys=next_pid:onmatch(sched.sched_wakeup).wakeup_latency(sched.sched_switch.$wakeup_lat,prev_pid)
> > >  if next_comm=="cyclictest"' >> 
> > > /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
> > >   # echo 'hist:keys=next_pid if next_comm=="cyclictest"' >> 
> > > /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
> > 
> > I ensured this sequence has no problem.
> > After above sequence, the trigger file shows
> > 
> > ==
> > # cat events/sched/sched_switch/trigger
> > hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global
> >  if next_comm=="cyclictest" [active]
> > hist:keys=next_pid:vals=hitcount:sort=hitcount:size=2048:onmatch(sched.sched_wakeup).wakeup_latency(sched.sched_switch.$wakeup_lat,next_pid)
> >  if next_comm=="cyclictest" [active]
> > hist:keys=next_pid:vals=hitcount:sort=hitcount:size=2048:onmatch(sched.sched_wakeup).wakeup_latency(sched.sched_switch.$wakeup_lat,prev_pid)
> >  if next_comm=="cyclictest" [active]
> > hist:keys=next_pid:vals=hitcount:sort=hitcount:size=2048 if 
> > next_comm=="cyclictest" [active]
> > ==
> > 
> > So I clear the last one
> > 
> > ==
> > # echo '!hist:keys=next_pid if next_comm=="cyclictest"' >> 
> > events/sched/sched_switch/trigger
> > #
> > ==
> > 
> > OK, it should be removed. I can not see it anymore on the trigger file.
> > 
> > ==
> > # cat events/sched/sched_switch/trigger
> > hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global
> >  if next_comm=="cyclictest" [active]
> > hist:keys=next_pid:vals=hitcount:sort=hitcount:size=2048:onmatch(sched.sched_wakeup).wakeup_latency(sched.sched_switch.$wakeup_lat,next_pid)
> >  if next_comm=="cyclictest" [active]
> > hist:keys=next_pid:vals=hitcount:sort=hitcount:size=2048:onmatch(sched.sched_wakeup).wakeup_latency(sched.sched_switch.$wakeup_lat,prev_pid)
> >  if next_comm=="cyclictest" [active]
> > ==
> > 
> > But when I missed to remove it again, it is accepted (is not an error?)
> > 
> > ==
> > # echo '!hist:keys=next_pid if next_comm=="cyclictest"' >> 
> > events/sched/sched_switch/trigger
> > #
> > ==
> 
> This is consistent with existing behavior, not only for the hist
> triggers but ftrace in general I think e.g.
> 
> # echo 'traceoff:1 if nr_rq > 1' >> 
> /sys/kernel/debug/tracing/events/block/block_unplug/trigger
> # echo '!traceoff:1 if nr_rq > 1' >> 
> /sys/kernel/debug/tracing/events/block/block_unplug/trigger
> # echo '!traceoff:1 if nr_rq > 1' >> 
> /sys/kernel/debug/tracing/events/block/block_unplug/trigger
> 
> # echo 'try_to_wake_up:enable_event:sched:sched_switch:2' >> set_ftrace_filter
> # echo '!try_to_wake_up:enable_event:sched:sched_switch:2' >> 
> set_ftrace_filter
> # echo '!try_to_wake_up:enable_event:sched:sched_switch:2' >> 
> set_ftrace_filter
> 
> Neither produces an error trying to remove an already-removed trigger.
> Or are you thinking about something else?

Hmm, OK, it is ftrace's behavior.

> > Hmm... anyway, let's clear others too.
> > 
> > ==
> > # echo 
> > '!hist:keys=next_pid:onmatch(sched.sched_wakeup).wakeup_latency(sched.sched_switch.$wakeup_lat,next_pid)
> >  if next_comm=="cyclictest"' >> events/sched/sched_switch/trigger
> > # echo 
> > '!hist:keys=next_pid:onmatch(sched.sched_wakeup).wakeup_latency(sched.sched_switch.$wakeup_lat,prev_pid)
> >  if next_comm=="cyclictest"' >> events/sched/sched_switch/trigger
> > # cat events/sched/sched_switch/trigger
> > # Available triggers:
> > # traceon traceoff snapshot stacktrace enable_event disable_event 
> > enable_hist disable_hist hist
> > ==
> > 
> > OK, it is cleared now.
> > 
> > Now I test it again.
> > 
> > ==
> > # echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="cyclictest"' >> 
> > events/sched/sched_wakeup/trigger
> > sh: write error: Invalid argument
> > ==
> > 
> > Oops, what's the error?
> > 
> > ==
> > # cat events/sched/sched_switch/hist
> > 
> > ERROR: Variable already defined: ts0
> >   Last command: keys=pid:ts0

Re: [PATCH] f2fs: fix to show encrypt flag in FS_IOC_GETFLAGS

2018-04-04 Thread Jaegeuk Kim
On 04/03, Chao Yu wrote:
> On 2018/4/3 4:21, Jaegeuk Kim wrote:
> > On 04/02, Chao Yu wrote:
> >> This patch fixes to show encrypt flag in FS_IOC_GETFLAGS like ext4 does.
> > 
> > Actually, we have to show internal flags owned by f2fs, not generic ones.
> > We may need to define all of them separately?
> 
> Agreed, I wrote a patch, could check that? and in that patch, do we need to
> delete flag definition f2fs don't use?

IMO, we'd better keep the flags. I merged it and could you add encryption part
on top of it?

Thanks,

> 
> Thanks,
> 
> > 
> >>
> >> Signed-off-by: Chao Yu 
> >> ---
> >>  fs/f2fs/file.c | 9 +++--
> >>  1 file changed, 7 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> >> index 8068b015ece5..271fadadaa36 100644
> >> --- a/fs/f2fs/file.c
> >> +++ b/fs/f2fs/file.c
> >> @@ -1584,8 +1584,13 @@ static int f2fs_ioc_getflags(struct file *filp, 
> >> unsigned long arg)
> >>  {
> >>struct inode *inode = file_inode(filp);
> >>struct f2fs_inode_info *fi = F2FS_I(inode);
> >> -  unsigned int flags = fi->i_flags &
> >> -  (FS_FL_USER_VISIBLE | FS_PROJINHERIT_FL);
> >> +  unsigned int flags = fi->i_flags;
> >> +
> >> +  if (file_is_encrypt(inode))
> >> +  flags |= FS_ENCRYPT_FL;
> >> +
> >> +  flags &= FS_FL_USER_VISIBLE | FS_PROJINHERIT_FL;
> >> +
> >>return put_user(flags, (int __user *)arg);
> >>  }
> >>  
> >> -- 
> >> 2.15.0.55.gc2ece9dc4de6
> > 
> > .
> > 


[GIT PULL] f2fs update for 4.17-rc1

2018-04-04 Thread Jaegeuk Kim
Hi Linus,

Could you please consider this pull request?

Thanks,

The following changes since commit 3664ce2d930983966d2aac0e167f1332988c4e25:

  Merge tag 'powerpc-4.16-4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux (2018-02-24 
16:05:50 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git 
tags/f2fs-for-4.17

for you to fetch changes up to 214c2461a864a46b11856426b80dc7db453043c5:

  f2fs: remain written times to update inode during fsync (2018-04-03 18:52:47 
-0700)


f2fs-for-4.17-rc1

In this round, we've mainly focused on performance tuning and critical bug fixes
occurred in low-end devices. Sheng Yong introduced lost_found feature to keep
missing files during recovery instead of thrashing them. We're preparing coming
fsverity implementation. And, we've got more features to communicate with users
for better performance. In low-end devices, some memory-related issues were
fixed, and subtle race condtions and corner cases were addressed as well.

Enhancement:
 - large nat bitmaps for more free node ids
 - add three block allocation policies to pass down write hints given by user
 - expose extension list to user and introduce hot file extension
 - tune small devices seamlessly for low-end devices
 - set readdir_ra by default
 - give more resources under gc_urgent mode regarding to discard and cleaning
 - introduce fsync_mode to enforce posix or not
 - nowait aio support
 - add lost_found feature to keep dangling inodes
 - reserve bits for future fsverity feature
 - add test_dummy_encryption for FBE

Bug fix:
 - don't use highmem for dentry pages
 - align memory boundary for bitops
 - truncate preallocated blocks in write errors
 - guarantee i_times on fsync call
 - clear CP_TRIMMED_FLAG correctly
 - prevent node chain loop during recovery
 - avoid data race between atomic write and background cleaning
 - avoid unnecessary selinux violation warnings on resgid option
 - GFP_NOFS to avoid deadlock in quota and read paths
 - fix f2fs_skip_inode_update to allow i_size recovery

In addition to them, there are several minor bug fixes and clean-ups.


Chao Yu (16):
  f2fs: restrict inline_xattr_size configuration
  f2fs: fix to check extent cache in f2fs_drop_extent_tree
  f2fs: support large nat bitmap
  f2fs: fix to clear CP_TRIMMED_FLAG
  f2fs: fix to handle looped node chain during recovery
  f2fs: introduce sb_lock to make encrypt pwsalt update exclusive
  f2fs: fix to set KEEP_SIZE bit in f2fs_zero_range
  f2fs: expose extension_list sysfs entry
  f2fs: fix to avoid race in between atomic write and background GC
  f2fs: support hot file extension
  f2fs: wrap sb_rdonly with f2fs_readonly
  f2fs: fix to restore old mount option in ->remount_fs
  f2fs: wrap all options with f2fs_sb_info.mount_opt
  f2fs: remove unneeded set_cold_node()
  f2fs: clean up with F2FS_BLK_ALIGN
  f2fs: don't track new nat entry in nat set

Colin Ian King (1):
  f2fs: remove redundant initialization of pointer 'p'

Eric Biggers (1):
  f2fs: reserve bits for fs-verity

Gao Xiang (1):
  f2fs: flush cp pack except cp pack 2 page at first

Hyunchul Lee (4):
  f2fs: support passing down write hints given by users to block layer
  f2fs: support passing down write hints to block layer with F2FS policy
  f2fs: Add the 'whint_mode' mount option to f2fs documentation
  f2fs: add nowait aio support

Jaegeuk Kim (11):
  f2fs: handle quota for orphan inodes
  f2fs: don't stop GC if GC is contended
  f2fs: add mount option for segment allocation policy
  f2fs: add auto tuning for small devices
  f2fs: set readdir_ra by default
  f2fs: issue discard aggressively in the gc_urgent mode
  f2fs: do gc in greedy mode for whole range if gc_urgent mode is set
  f2fs: avoid selinux denial on CAP_SYS_RESOURCE
  f2fs: align memory boundary for bitops
  f2fs: truncate preallocated blocks in error case
  f2fs: remain written times to update inode during fsync

Junling Zheng (2):
  f2fs: introduce mount option for fsync mode
  f2fs: fix a wrong condition in f2fs_skip_inode_update

Qiuyang Sun (1):
  f2fs: release locks before return in f2fs_ioc_gc_range()

Ritesh Harjani (1):
  f2fs: Set GF_NOFS in read_cache_page_gfp while doing f2fs_quota_read

Sheng Yong (4):
  f2fs: fix potential corruption in area before F2FS_SUPER_OFFSET
  f2fs: clean up f2fs_sb_has_xxx functions
  f2fs: introduce F2FS_FEATURE_LOST_FOUND feature
  f2fs: introduce a new mount option test_dummy_encryption

Tiezhu Yang (1):
  f2fs: remove redundant check of page type when submit bio

Yunlei He (3):
  f2fs: Don't overwrite all types of node to keep node chain
  f2fs: check blkaddr more accuratly before issue

Re: WARNING in up_write

2018-04-04 Thread Matthew Wilcox
On Wed, Apr 04, 2018 at 11:22:00PM -0400, Theodore Y. Ts'o wrote:
> On Wed, Apr 04, 2018 at 12:35:04PM -0700, Matthew Wilcox wrote:
> > On Wed, Apr 04, 2018 at 09:24:05PM +0200, Dmitry Vyukov wrote:
> > > On Tue, Apr 3, 2018 at 4:01 AM, syzbot
> > >  wrote:
> > > > DEBUG_LOCKS_WARN_ON(sem->owner != get_current())
> > > > WARNING: CPU: 1 PID: 4441 at kernel/locking/rwsem.c:133 
> > > > up_write+0x1cc/0x210
> > > > kernel/locking/rwsem.c:133
> > > > Kernel panic - not syncing: panic_on_warn set ...
> > 
> > Message-Id: <1522852646-2196-1-git-send-email-long...@redhat.com>
> >
> 
> We were way ahead of syzbot in this case.  :-)

Not really ... syzbot caught it Monday evening ;-)

Date: Mon, 02 Apr 2018 19:01:01 -0700
From: syzbot 
To: linux-fsde...@vger.kernel.org, linux-kernel@vger.kernel.org,
syzkaller-b...@googlegroups.com, v...@zeniv.linux.org.uk
Subject: WARNING in up_write


Re: WARNING in up_write

2018-04-04 Thread Theodore Y. Ts'o
On Wed, Apr 04, 2018 at 12:35:04PM -0700, Matthew Wilcox wrote:
> On Wed, Apr 04, 2018 at 09:24:05PM +0200, Dmitry Vyukov wrote:
> > On Tue, Apr 3, 2018 at 4:01 AM, syzbot
> >  wrote:
> > > DEBUG_LOCKS_WARN_ON(sem->owner != get_current())
> > > WARNING: CPU: 1 PID: 4441 at kernel/locking/rwsem.c:133 
> > > up_write+0x1cc/0x210
> > > kernel/locking/rwsem.c:133
> > > Kernel panic - not syncing: panic_on_warn set ...
> 
> Message-Id: <1522852646-2196-1-git-send-email-long...@redhat.com>
>

We were way ahead of syzbot in this case.  :-)

I reported the problem Tuesday morning:

https://lkml.org/lkml/2018/4/4/814

And within a few hours Waiman had proposed a fix:

https://patchwork.kernel.org/patch/10322639/

Note also that it's not ext4 specific.  It can be trivially reproduced using 
any one of:

kvm-xfstests -c ext4 generic/068
kvm-xfstests -c btrfs generic/068
kvm-xfstests -c xfs generic/068

(Basically, any file system that supports freeze/thaw.)

Cheers,

- Ted


Re: [PATCH] locking/rwsem: Add up_write_non_owner() for percpu_up_write()

2018-04-04 Thread Theodore Y. Ts'o
On Wed, Apr 04, 2018 at 10:37:26AM -0400, Waiman Long wrote:
> The commit 8c5db92a705d ("locking/rwsem: Add DEBUG_RWSEMS to look for
> lock/unlock mismatches") causes a warning in ext4 fstests due to the
> fact that the freeze and thaw ioctls, by design, are run in different
> processes.

While true, it's not just ext4 fstests --- it's any file system which
supports freeze/thaw, since it's happening in the vfs layer.  I have
just as easily replicated the problem using "kvm-xfstests -c xfs
generic/068".  Or "kvm-xfstests -c btrfs generic/068".

Cheers,

- Ted


Re: [PATCH v1] kernel/trace:check the val against the available mem

2018-04-04 Thread Matthew Wilcox
On Wed, Apr 04, 2018 at 11:47:30AM -0400, Steven Rostedt wrote:
> I originally was going to remove the RETRY_MAYFAIL, but adding this
> check (at the end of the loop though) appears to have OOM consistently
> kill this task.
> 
> I still like to keep RETRY_MAYFAIL, because it wont trigger OOM if
> nothing comes in and tries to do an allocation, but instead will fail
> nicely with -ENOMEM.

I still don't get why you want RETRY_MAYFAIL.  You know that tries
*harder* to allocate memory than plain GFP_KERNEL does, right?  And
that seems like the exact opposite of what you want.


Re: [PATCH] gup: return -EFAULT on access_ok failure

2018-04-04 Thread Linus Torvalds
On Wed, Apr 4, 2018 at 6:53 PM, Michael S. Tsirkin  wrote:
>
> Any feedback on this? As this fixes a bug in vhost, I'll merge
> through the vhost tree unless someone objects.

NAK.

__get_user_pages_fast() returns the number of pages it gets.

It has never returned an error code, and all the other versions of it
(architecture-specific) don't either.

If you ask for one page, and get zero pages, then that's an -EFAULT.
Note that that's an EFAULT regardless of whether that zero page
happened due to kernel addresses or just lack of mapping in user
space.

The documentation is simply wrong if it says anything else. Fix the
docs, and fix the users.

The correct use has always been to check the number of pages returned.

Just looking around, returning an error number looks like it could
seriously confuse some things. You have things like the kvm code that
does the *right* thing:

unsigned long ... npinned ...

npinned = get_user_pages_fast(uaddr, npages, write ?
FOLL_WRITE : 0, pages);
if (npinned != npages) {
 ...

err:
if (npinned > 0)
release_pages(pages, npinned);

and the above code clearly depends on the actual behavior, not on the
documentation.

Any changes in this area would need some *extreme* care, exactly
because of code like the above that clearly depends on the existing
semantics.

In fact, the documentation really seems to be just buggy. The actual
get_user_pages() function itself is expressly being careful *not* to
return an error code, it even has a comment to the effect ("Have to be
a bit careful with return values").

So the "If no pages were pinned, returns -errno" comment is just bogus.

  Linus


Re: An actual suggestion (Re: [GIT PULL] Kernel lockdown for secure boot)

2018-04-04 Thread joeyli
Hi David, 

On Wed, Apr 04, 2018 at 05:17:24PM +0100, David Howells wrote:
> Andy Lutomirski  wrote:
> 
> > Since this thread has devolved horribly, I'm going to propose a solution.
> > 
> > 1. Split the "lockdown" state into three levels:  (please don't
> > bikeshed about the names right now.)
> > 
> > LOCKDOWN_NONE: normal behavior
> > 
> > LOCKDOWN_PROTECT_INTEGREITY: kernel tries to keep root from writing to
> > kernel memory
> > 
> > LOCKDOWN_PROTECT_INTEGRITY_AND_SECRECY: kernel tries to keep root from
> > reading or writing kernel memory.
> 
> In theory, it's good idea, but in practice it's not as easy to implement as I
> think you think.
> 
> Let me list here the things that currently get restricted by lockdown:
> 
[...snip]
>  (5) Kexec.
>

About IMA with kernel module signing and kexec(not on x86_64 yet)...

Because IMA can be used to verify the integrity of kernel module or even
the image for kexec. I think that the
IMA_KEYRINGS_PERMIT_SIGNED_BY_BUILTIN_OR_SECONDARY must be enabled at runtime
when kernel is locked-down.

Because the root can enroll master key to keyring then IMA trusts the ima key
derived from master key. It causes that the arbitrary signed module can be 
loaded
when the root compromised.

Thanks
Joey Lee


Re: [PATCH] mmc: dw_mmc-k3: Fix DDR52 mode by setting required clock divisor

2018-04-04 Thread zhangfei

+ Hisilicon colleague


On 2018年04月05日 08:51, Shawn Lin wrote:

[+ Zhangfei Gao who added support for hi6220]

On 2018/4/4 23:31, Ryan Grachek wrote:
On Tue, Apr 3, 2018 at 6:31 AM, Shawn Lin > wrote:


    On 2018/3/30 2:24, oscardagrach wrote:

    Need at least one line commit body.

    Signed-off-by: oscardagrach >

    ---
   drivers/mmc/host/dw_mmc-k3.c | 10 --
   1 file changed, 8 insertions(+), 2 deletions(-)

    diff --git a/drivers/mmc/host/dw_mmc-k3.c
    b/drivers/mmc/host/dw_mmc-k3.c
    index 89cdb3d533bb..efc546cb4db8 100644
    --- a/drivers/mmc/host/dw_mmc-k3.c
    +++ b/drivers/mmc/host/dw_mmc-k3.c
    @@ -194,8 +194,14 @@ static void dw_mci_hi6220_set_ios(struct
    dw_mci *host, struct mmc_ios *ios)
         int ret;
         unsigned int clock;
   -     clock = (ios->clock <= 2500) ? 2500 : 
ios->clock;

    -
    +       /* CLKDIV must be 1 for DDR52/8-bit mode */
    +       if (ios->bus_width == MMC_BUS_WIDTH_8 &&
    +               ios->timing == MMC_TIMING_MMC_DDR52) {
    +               mci_writel(host, CLKDIV, 0x1);
    +               clock = ios->clock;
    +       } else {
    +               clock = (ios->clock <= 2500) ? 2500 :
    ios->clock;
    +       }


    I undertand DDR52/8-bit need CLKDIV fixed 1, but shouldn't the 
following

    change is more sensible?

    if (ios->bus_width == MMC_BUS_WIDTH_8 && ios->timing ==
    MMC_TIMING_MMC_DDR52)
         clock = ios->clock * 2;
    else
         clock = (ios->clock <= 2500) ? 2500 : ios->clock;


    The reason is ios->clock is 52MHz and you could claim 104MHz from 
the

    clock provider and let dw_mmc core take care of the divder to be 1.
    Otherwise, you just force it to be DDR52/8-bit with a clk rate of 
26MHz.



         ret = clk_set_rate(host->biu_clk, clock);
         if (ret)
                 dev_warn(host->dev, "failed to set rate
    %uHz\n", clock);





For future wise, please use plain mode mail, but not HTML format.


Your feedback is correct. I see the Rockchip dwmmc driver has a similar
implementation. After applying your suggested changes, however, my board
reports "dwmmc_k3 f723d000.dwmmc0: failed to set rate 10400Hz"
during intialization of eMMC. In addition, I do not see CLKDIV being
set to 1. clk_set_rate fails and I wonder if this is out-of-scope of
the driver.

If I set CLKDIV where I did prior, with your changes, the device fails
to set the clock and falls back to 52 MHz (26 MHz) and works fine, but
again, CLKDIV is reported as 0 (even though it is 1.) One thing of
interest to note is when I manually set the clock by doing:
(echo 10400 > /sys/kernel/debug/mmc0/clock) the device reports back
'mmc_host mmc0: Bus speed (slot 0) = 19840Hz (slot req 10400Hz,
  actual 9920HZ div = 1)' which works reliably and clk_set_rate does
not report any error.



When looking closely into the code, at least dw_mci_hi6220_set_ios
goes wrong with the bus_hz, since it should be ciu_clk but not biu_clk.
"b" stands for bus, and "c" stands for card IMHO, however bus_hz
describs the clock to the card, provided by controller. Does the
following patch help?


diff --git a/drivers/mmc/host/dw_mmc-k3.c b/drivers/mmc/host/dw_mmc-k3.c
index 89cdb3d..9e78cf2 100644
--- a/drivers/mmc/host/dw_mmc-k3.c
+++ b/drivers/mmc/host/dw_mmc-k3.c
@@ -194,13 +194,21 @@ static void dw_mci_hi6220_set_ios(struct dw_mci 
*host, struct mmc_ios *ios)

    int ret;
    unsigned int clock;

-   clock = (ios->clock <= 2500) ? 2500 : ios->clock;
+   if (ios->bus_width == MMC_BUS_WIDTH_8 &&
+   ios->timing == MMC_TIMING_MMC_DDR52)
+   clock = ios->clock * 2;
+   else
+   clock = (ios->clock <= 2500) ? 2500 : ios->clock;

-   ret = clk_set_rate(host->biu_clk, clock);
+   ret = clk_set_rate(host->ciu_clk, clock);
    if (ret)
    dev_warn(host->dev, "failed to set rate %uHz\n", clock);

-   host->bus_hz = clk_get_rate(host->biu_clk);
+   clock = clk_get_rate(host->ciu_clk);
+   if (clock != host->bus_hz) {
+   host->bus_hz = clock;
+   host->current_speed = 0;
+   }
 }



I am not sure where to begin debugging these clock issues and welcome
any feedback.






Re: [rtlwifi-btcoex] Suspicious code in halbtc8821a1ant driver

2018-04-04 Thread Pkshih
On Thu, 2018-04-05 at 01:25 +, Gustavo A. R. Silva wrote:
> Hi all,
> 
> While doing some static analysis I came across the following piece of code at
> drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtc8821a1ant.c:1581:
> 
> 1581 static void btc8821a1ant_act_bt_sco_hid_only_busy(struct btc_coexist 
> *btcoexist,
> 1582   u8 wifi_status)
> 1583 {
> 1584 /* tdma and coex table */
> 1585 btc8821a1ant_ps_tdma(btcoexist, NORMAL_EXEC, true, 5);
> 1586 
> 1587 if (BT_8821A_1ANT_WIFI_STATUS_NON_CONNECTED_ASSO_AUTH_SCAN ==
> 1588 wifi_status)
> 1589 btc8821a1ant_coex_table_with_type(btcoexist, 
> NORMAL_EXEC, 1);
> 1590 else
> 1591 btc8821a1ant_coex_table_with_type(btcoexist, 
> NORMAL_EXEC, 1);
> 1592 }
> 
> The issue here is that the code for both branches of the if-else statement is 
> identical.
> 
> The if-else was introduced a year ago in this commit c6821613e653
> 
> I wonder if an argument should be changed in any of the calls to
> btc8821a1ant_coex_table_with_type?
> 
> 

It looks weird. Since we're in spring vacation, I'll check my colleague next 
Monday.

PK



Re: [PATCH?] reiserfs: prevent panic: don't allow %-char in journal dev. name

2018-04-04 Thread Randy Dunlap
On 04/04/2018 06:48 PM, Jeff Mahoney wrote:
> On 4/4/18 9:45 PM, Andrew Morton wrote:
>> On Wed, 4 Apr 2018 18:25:16 -0700 Randy Dunlap  wrote:
>>
>>> From: Randy Dunlap 
>>>
>>> If the reiserfs mount option's journal name contains a '%' character,
>>> it can lead to a WARN_ONCE() in lib/vsprintf.c::format_decode(),
>>> saying: "Please remove unsupported %/ in format string."
>>> That's OK until panic_on_warn is set, at which point it's dead, Jim.
>>>
>>> To placate this situation, check the journal name string for a '%'
>>> character and return an error if one is found. Also print a warning
>>> (one that won't panic the kernel) about the invalid journal name (e.g.):
>>>
>>>   reiserfs: journal device name is invalid: %/file0
>>>
>>> (In this example, the caller app specified the journal device name as
>>> "%/file0".)
>>>
>>
>> Well, that is a valid filename and we should support it...
>>
>> Isn't the bug in journal_init_dev()?

OK, thanks.

> Yep.  That's exactly it.
> 
> Acked-by: Jeff Mahoney 
> 
> Thanks,
> 
> -Jeff
> 
>> --- a/fs/reiserfs/journal.c~a
>> +++ a/fs/reiserfs/journal.c
>> @@ -2643,7 +2643,7 @@ static int journal_init_dev(struct super
>>  if (IS_ERR(journal->j_dev_bd)) {
>>  result = PTR_ERR(journal->j_dev_bd);
>>  journal->j_dev_bd = NULL;
>> -reiserfs_warning(super,
>> +reiserfs_warning(super, "sh-457",
>>   "journal_init_dev: Cannot open '%s': %i",
>>   jdev_name, result);
>>  return result;
>> _
>>
>>
> 
> 


-- 
~Randy


RE: [PATCH v30 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-04-04 Thread Wang, Wei W
On Thursday, April 5, 2018 9:12 AM, Michael S. Tsirkin wrote:
> On Thu, Apr 05, 2018 at 12:30:27AM +, Wang, Wei W wrote:
> > On Wednesday, April 4, 2018 10:08 PM, Michael S. Tsirkin wrote:
> > > On Wed, Apr 04, 2018 at 10:07:51AM +0800, Wei Wang wrote:
> > > > On 04/04/2018 02:47 AM, Michael S. Tsirkin wrote:
> > > > > On Wed, Apr 04, 2018 at 12:10:03AM +0800, Wei Wang wrote:
> > I'm afraid the driver couldn't be aware if the added hints are stale
> > or not,
> 
> 
> No - I mean that driver has code that compares two values and stops
> reporting. Can one of the values be stale?

The driver compares "vb->cmd_id_use != vb->cmd_id_received" to decide if it 
needs to stop reporting hints, and cmd_id_received is what the driver reads 
from host (host notifies the driver to read for the latest value). If host 
sends a new cmd id, it will notify the guest to read again. I'm not sure how 
that could be a stale cmd id (or maybe I misunderstood your point here?)

Best,
Wei


WARNING: kobject bug in sysfs_warn_dup

2018-04-04 Thread syzbot

Hello,

syzbot hit the following crash on upstream commit
3e968c9f1401088abc9a19ae6ff571644d37a355 (Wed Apr 4 21:19:24 2018 +)
Merge tag 'ext4_for_linus' of  
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=ff87a28e665c163aa7f5


C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5104666266304512
syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=5683447737614336
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5104818200772608
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=9118669095563550941

compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+ff87a28e665c163aa...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

R10:  R11: 0286 R12: 0003
R13: 0004 R14:  R15: 
[ cut here ]
kobject_add_internal failed for nodev( with -EEXIST, don't try to register  
things with the same name in the same directory.

sysfs: cannot create duplicate filename '/fs/gfs2/nodev('
WARNING: CPU: 1 PID: 4473 at lib/kobject.c:238  
kobject_add_internal+0x8d4/0xbc0 lib/kobject.c:235

CPU: 0 PID: 4474 Comm: syzkaller533472 Not tainted 4.16.0+ #15
Kernel panic - not syncing: panic_on_warn set ...

Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x1a7/0x27d lib/dump_stack.c:53
 sysfs_warn_dup+0x83/0xa0 fs/sysfs/dir.c:30
 sysfs_create_dir_ns+0x178/0x1d0 fs/sysfs/dir.c:58
 create_dir lib/kobject.c:69 [inline]
 kobject_add_internal+0x335/0xbc0 lib/kobject.c:227
 kobject_add_varg lib/kobject.c:364 [inline]
 kobject_init_and_add+0xf9/0x150 lib/kobject.c:436
 gfs2_sys_fs_add+0x1ff/0x580 fs/gfs2/sys.c:652
 fill_super+0x86f/0x1d70 fs/gfs2/ops_fstype.c:1118
 gfs2_mount+0x587/0x6e0 fs/gfs2/ops_fstype.c:1321
 mount_fs+0x66/0x2d0 fs/super.c:1222
 vfs_kern_mount.part.26+0xc6/0x4a0 fs/namespace.c:1037
 vfs_kern_mount fs/namespace.c:2514 [inline]
 do_new_mount fs/namespace.c:2517 [inline]
 do_mount+0xea4/0x2b90 fs/namespace.c:2847
 ksys_mount+0xab/0x120 fs/namespace.c:3063
 SYSC_mount fs/namespace.c:3077 [inline]
 SyS_mount+0x39/0x50 fs/namespace.c:3074
 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x4432fa
RSP: 002b:7ffda3d84538 EFLAGS: 0286 ORIG_RAX: 00a5
RAX: ffda RBX: 20001a40 RCX: 004432fa
RDX: 20001a00 RSI: 20001a40 RDI: 7ffda3d84550
RBP:  R08: 20001f00 R09: 000a
R10:  R11: 0286 R12: 0003
R13: 0004 R14:  R15: 
CPU: 1 PID: 4473 Comm: syzkaller533472 Not tainted 4.16.0+ #15
[ cut here ]
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x1a7/0x27d lib/dump_stack.c:53
kobject_add_internal failed for nodev( with -EEXIST, don't try to register  
things with the same name in the same directory.

 panic+0x1f8/0x42c kernel/panic.c:183
WARNING: CPU: 0 PID: 4474 at lib/kobject.c:238  
kobject_add_internal+0x8d4/0xbc0 lib/kobject.c:235

Modules linked in:
 __warn+0x1dc/0x200 kernel/panic.c:547
CPU: 0 PID: 4474 Comm: syzkaller533472 Not tainted 4.16.0+ #15
 report_bug+0x1f4/0x2b0 lib/bug.c:186
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

 fixup_bug.part.10+0x37/0x80 arch/x86/kernel/traps.c:178
RIP: 0010:kobject_add_internal+0x8d4/0xbc0 lib/kobject.c:235
 fixup_bug arch/x86/kernel/traps.c:247 [inline]
 do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
RSP: :8801addaf470 EFLAGS: 00010282
RAX: dc08 RBX: 8801d9661110 RCX: 815b5d2e
RDX:  RSI: 110035bb5e3e RDI: 110035bb5e13
RBP: 8801addaf568 R08: 110035bb5dd5 R09: 0001
R10: 0001 R11:  R12: 110035bb5e94
R13: ffef R14: 8801d39ae348 R15: 110035bb5e98
FS:  01db2880() GS:8801db00() knlGS:
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
 invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:991
RIP: 0010:kobject_add_internal+0x8d4/0xbc0 lib/kobject.c:235
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 019657b0 CR3: 0001ae0ca000 CR4: 001406f0
RSP: 0018:8801ade37470 EFLAGS: 00010282
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
RAX: dc08 RBX: 8801d9459190 RCX

general protection fault in mount_fs

2018-04-04 Thread syzbot

Hello,

syzbot hit the following crash on upstream commit
3e968c9f1401088abc9a19ae6ff571644d37a355 (Wed Apr 4 21:19:24 2018 +)
Merge tag 'ext4_for_linus' of  
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=01ffaf5d9568dd1609f7


C reproducer: https://syzkaller.appspot.com/x/repro.c?id=6269272955289600
syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=5190169938362368
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5077254979715072
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=9118669095563550941

compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+01ffaf5d9568dd160...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

hfsplus: failed to load root directory
hfsplus: failed to load root directory
hfsplus: failed to load root directory
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault:  [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 5704 Comm: syzkaller335224 Not tainted 4.16.0+ #15
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

RIP: 0010:mount_fs+0x92/0x2d0 fs/super.c:1227
RSP: 0018:8801acc9fad0 EFLAGS: 00010202
RAX: dc00 RBX:  RCX: 81b2ddaa
RDX: 0019 RSI:  RDI: 00c8
RBP: 8801acc9fb00 R08: 110035993e87 R09: ed0035993efc
R10:  R11:  R12: 886b29c0
R13: 8801c6243000 R14: 8801ac942ac0 R15: 
FS:  7f8ca5272700() GS:8801db10() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 004c5198 CR3: 0001af14e000 CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 vfs_kern_mount.part.26+0xc6/0x4a0 fs/namespace.c:1037
 vfs_kern_mount fs/namespace.c:2514 [inline]
 do_new_mount fs/namespace.c:2517 [inline]
 do_mount+0xea4/0x2b90 fs/namespace.c:2847
 ksys_mount+0xab/0x120 fs/namespace.c:3063
 SYSC_mount fs/namespace.c:3077 [inline]
 SyS_mount+0x39/0x50 fs/namespace.c:3074
 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x44a34a
RSP: 002b:7f8ca5271d38 EFLAGS: 0297 ORIG_RAX: 00a5
RAX: ffda RBX: 2318 RCX: 0044a34a
RDX: 2000 RSI: 2100 RDI: 7f8ca5271d50
RBP: 006e39c4 R08: 2140 R09: 000a
R10:  R11: 0297 R12: 0003
R13: 0004 R14: 006e39c0 R15: 0073756c70736668
Code: 3d 00 f0 ff ff 48 89 c3 0f 87 eb 01 00 00 e8 66 c6 be ff 48 8d bb c8  
00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00  
0f 85 08 02 00 00 4c 8b b3 c8 00 00 00 4d 85 f6 0f

RIP: mount_fs+0x92/0x2d0 fs/super.c:1227 RSP: 8801acc9fad0
---[ end trace 81f30bb7bca06fe8 ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkal...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged

into any tree, please reply to this email with:
#syz fix: exact-commit-title
If you want to test a patch for this bug, please reply with:
#syz test: git://repo/address.git branch
and provide the patch inline or as an attachment.
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.

Note: all commands must start from beginning of the line in the email body.


Re: [GIT PULL] Staging/IIO driver changes for 4.17-rc1

2018-04-04 Thread Linus Torvalds
On Wed, Apr 4, 2018 at 3:32 AM, Greg KH  wrote:
>
> It is a lot, over 500 changes, but not huge by previous kernel release
> standards.  We deleted more lines than we added again (27k added vs. 91k
> remvoed), thanks to finally being able to delete the IRDA drivers and
> networking code.

Hmm. The irda sysctl tables are still there in the kernel. Overlooked?

Linus


Re: [PATCH] gup: return -EFAULT on access_ok failure

2018-04-04 Thread Michael S. Tsirkin
On Fri, Mar 30, 2018 at 08:37:45PM +0300, Michael S. Tsirkin wrote:
> get_user_pages_fast is supposed to be a faster drop-in equivalent of
> get_user_pages. As such, callers expect it to return a negative return
> code when passed an invalid address, and never expect it to
> return 0 when passed a positive number of pages, since
> its documentation says:
> 
>  * Returns number of pages pinned. This may be fewer than the number
>  * requested. If nr_pages is 0 or negative, returns 0. If no pages
>  * were pinned, returns -errno.
> 
> Unfortunately this is not what the implementation does: it returns 0 if
> passed a kernel address, confusing callers: for example, the following
> is pretty common but does not appear to do the right thing with a kernel
> address:
> 
> ret = get_user_pages_fast(addr, 1, writeable, &page);
> if (ret < 0)
> return ret;
> 
> Change get_user_pages_fast to return -EFAULT when supplied a
> kernel address to make it match expectations.
> 
> __get_user_pages_fast does not seem to be used like this, but let's
> change __get_user_pages_fast as well for consistency and to match
> documentation.
> 
> Lightly tested.
> 
> Cc: Kirill A. Shutemov 
> Cc: Andrew Morton 
> Cc: Huang Ying 
> Cc: Jonathan Corbet 
> Cc: Linus Torvalds 
> Cc: Peter Zijlstra 
> Cc: Thomas Gleixner 
> Cc: Thorsten Leemhuis 
> Cc: sta...@vger.kernel.org
> Fixes: 5b65c4677a57 ("mm, x86/mm: Fix performance regression in 
> get_user_pages_fast()")
> Reported-by: syzbot+6304bf97ef436580f...@syzkaller.appspotmail.com
> Signed-off-by: Michael S. Tsirkin 

Any feedback on this? As this fixes a bug in vhost, I'll merge
through the vhost tree unless someone objects.

> ---
>  mm/gup.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 6afae32..5642521 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1749,6 +1749,9 @@ int __get_user_pages_fast(unsigned long start, int 
> nr_pages, int write,
>   unsigned long flags;
>   int nr = 0;
>  
> + if (nr_pages <= 0)
> + return 0;
> +
>   start &= PAGE_MASK;
>   addr = start;
>   len = (unsigned long) nr_pages << PAGE_SHIFT;
> @@ -1756,7 +1759,7 @@ int __get_user_pages_fast(unsigned long start, int 
> nr_pages, int write,
>  
>   if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
>   (void __user *)start, len)))
> - return 0;
> + return -EFAULT;
>  
>   /*
>* Disable interrupts.  We use the nested form as we can already have
> @@ -1806,9 +1809,12 @@ int get_user_pages_fast(unsigned long start, int 
> nr_pages, int write,
>   len = (unsigned long) nr_pages << PAGE_SHIFT;
>   end = start + len;
>  
> + if (nr_pages <= 0)
> + return 0;
> +
>   if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
>   (void __user *)start, len)))
> - return 0;
> + return -EFAULT;
>  
>   if (gup_fast_permitted(start, nr_pages, write)) {
>   local_irq_disable();
> -- 
> MST


[PULL] fwcfg, vhost: features and fixes

2018-04-04 Thread Michael S. Tsirkin
The following changes since commit 0c8efd610b58cb23cefdfa12015799079aef94ae:

  Linux 4.16-rc5 (2018-03-11 17:25:09 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to dc32bb678e103afbcfa4d814489af0566307f528:

  vhost: add vsock compat ioctl (2018-03-20 03:17:42 +0200)


fw_cfg, vhost: features fixes

This cleans up the qemu fw cfg device driver.
On top of this, vmcore is dumped there on crash to
help debugging witH kASLR enabled.
Also included are some fixes in vhost.

Signed-off-by: Michael S. Tsirkin 


Marc-André Lureau (10):
  fw_cfg: fix sparse warnings in fw_cfg_sel_endianness()
  fw_cfg: fix sparse warnings with fw_cfg_file
  fw_cfg: fix sparse warning reading FW_CFG_ID
  fw_cfg: fix sparse warnings around FW_CFG_FILE_DIR read
  fw_cfg: remove inline from fw_cfg_read_blob()
  fw_cfg: handle fw_cfg_read_blob() error
  fw_cfg: add a public uapi header
  fw_cfg: add DMA register
  crash: export paddr_vmcoreinfo_note()
  fw_cfg: write vmcoreinfo details

Michael S. Tsirkin (1):
  ptr_ring: fix build

Sonny Rao (2):
  vhost: fix vhost ioctl signature to build with clang
  vhost: add vsock compat ioctl

 MAINTAINERS  |   1 +
 drivers/firmware/qemu_fw_cfg.c   | 291 +++
 drivers/vhost/vhost.c|   2 +-
 drivers/vhost/vhost.h|   4 +-
 drivers/vhost/vsock.c|  11 ++
 include/uapi/linux/qemu_fw_cfg.h |  97 +
 kernel/crash_core.c  |   1 +
 tools/virtio/ringtest/ptr_ring.c |   5 +
 8 files changed, 348 insertions(+), 64 deletions(-)
 create mode 100644 include/uapi/linux/qemu_fw_cfg.h


Re: [PATCH?] reiserfs: prevent panic: don't allow %-char in journal dev. name

2018-04-04 Thread Jeff Mahoney
On 4/4/18 9:45 PM, Andrew Morton wrote:
> On Wed, 4 Apr 2018 18:25:16 -0700 Randy Dunlap  wrote:
> 
>> From: Randy Dunlap 
>>
>> If the reiserfs mount option's journal name contains a '%' character,
>> it can lead to a WARN_ONCE() in lib/vsprintf.c::format_decode(),
>> saying: "Please remove unsupported %/ in format string."
>> That's OK until panic_on_warn is set, at which point it's dead, Jim.
>>
>> To placate this situation, check the journal name string for a '%'
>> character and return an error if one is found. Also print a warning
>> (one that won't panic the kernel) about the invalid journal name (e.g.):
>>
>>   reiserfs: journal device name is invalid: %/file0
>>
>> (In this example, the caller app specified the journal device name as
>> "%/file0".)
>>
> 
> Well, that is a valid filename and we should support it...
> 
> Isn't the bug in journal_init_dev()?

Yep.  That's exactly it.

Acked-by: Jeff Mahoney 

Thanks,

-Jeff

> --- a/fs/reiserfs/journal.c~a
> +++ a/fs/reiserfs/journal.c
> @@ -2643,7 +2643,7 @@ static int journal_init_dev(struct super
>   if (IS_ERR(journal->j_dev_bd)) {
>   result = PTR_ERR(journal->j_dev_bd);
>   journal->j_dev_bd = NULL;
> - reiserfs_warning(super,
> + reiserfs_warning(super, "sh-457",
>"journal_init_dev: Cannot open '%s': %i",
>jdev_name, result);
>   return result;
> _
> 
> 


-- 
Jeff Mahoney
SUSE Labs


Re: An actual suggestion (Re: [GIT PULL] Kernel lockdown for secure boot)

2018-04-04 Thread joeyli
On Wed, Apr 04, 2018 at 11:19:27PM +0100, David Howells wrote:
> Jann Horn  wrote:
> 
> > > Uh, no.  bpf, for example, can be used to modify kernel memory.
> > 
> > I'm pretty sure bpf isn't supposed to be able to modify arbitrary
> > kernel memory. AFAIU if you can use BPF to write to arbitrary kernel
> > memory, that's a bug; with CAP_SYS_ADMIN, you can read from userspace,
> > write to userspace, and read from kernelspace, but you shouldn't be
> > able to write to kernelspace.
> 
> Ah - you may be right.  I seem to have misremembered what Joey Lee wrote in
> his patch description.
>

Sorry for it's my fault to misunderstood the behavoir of bpf with
CAP_SYS_ADMIN.

Joey Lee


[rtlwifi-btcoex] Suspicious code in halbtc8821a1ant driver

2018-04-04 Thread Gustavo A. R. Silva
Hi all,

While doing some static analysis I came across the following piece of code at 
drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtc8821a1ant.c:1581:

1581 static void btc8821a1ant_act_bt_sco_hid_only_busy(struct btc_coexist 
*btcoexist,
1582   u8 wifi_status)
1583 {
1584 /* tdma and coex table */
1585 btc8821a1ant_ps_tdma(btcoexist, NORMAL_EXEC, true, 5);
1586 
1587 if (BT_8821A_1ANT_WIFI_STATUS_NON_CONNECTED_ASSO_AUTH_SCAN ==
1588 wifi_status)
1589 btc8821a1ant_coex_table_with_type(btcoexist, NORMAL_EXEC, 
1);
1590 else
1591 btc8821a1ant_coex_table_with_type(btcoexist, NORMAL_EXEC, 
1);
1592 }

The issue here is that the code for both branches of the if-else statement is 
identical.

The if-else was introduced a year ago in this commit c6821613e653

I wonder if an argument should be changed in any of the calls to 
btc8821a1ant_coex_table_with_type?

What do you think?

Thanks
--
Gustavo


Re: An actual suggestion (Re: [GIT PULL] Kernel lockdown for secure boot)

2018-04-04 Thread joeyli
Hi Andy,

On Wed, Apr 04, 2018 at 07:49:12AM -0700, Andy Lutomirski wrote:
> Since this thread has devolved horribly, I'm going to propose a solution.
...
> 6. There's a way to *decrease* the lockdown level below the configured
> value.  (This ability itself may be gated by a config option.)
> Choices include a UEFI protected variable, an authenticated flag
> passed by the bootloader, and even just some special flag in the boot
> handoff protocol.  It would be really quite useful for a user to be
> able to ask their bootloader to reduce the lockdown level for the
> purpose of a particular boot for debugging.  I read the docs on

The "mokutil --disable-validation" done a similar bahvior as above.
Just it lets kernel to ignore the secure boot. 

> mokutil --disable-validation, and it's quite messy.  Let's have a way
> to do this that is mostly independent of the particular firmware in
> use.
>

Why the disabl-validation is messy?   
The mokutil is shim specific but not dependent on particular firmware. 
 
> I can imagine a grub option that decreases lockdown level along with a
> rule that grub will *not* load that option from its config, for
> example.
>

The root can modify the grub config to decrease lockdown level in next
boot without physcial accessing. The mokutil's interactive UI is used
to deal with user to confirm the physcial accessing.

Thanks
Joey Lee


Re: [PATCH?] reiserfs: prevent panic: don't allow %-char in journal dev. name

2018-04-04 Thread Andrew Morton
On Wed, 4 Apr 2018 18:25:16 -0700 Randy Dunlap  wrote:

> From: Randy Dunlap 
> 
> If the reiserfs mount option's journal name contains a '%' character,
> it can lead to a WARN_ONCE() in lib/vsprintf.c::format_decode(),
> saying: "Please remove unsupported %/ in format string."
> That's OK until panic_on_warn is set, at which point it's dead, Jim.
> 
> To placate this situation, check the journal name string for a '%'
> character and return an error if one is found. Also print a warning
> (one that won't panic the kernel) about the invalid journal name (e.g.):
> 
>   reiserfs: journal device name is invalid: %/file0
> 
> (In this example, the caller app specified the journal device name as
> "%/file0".)
> 

Well, that is a valid filename and we should support it...

Isn't the bug in journal_init_dev()?

--- a/fs/reiserfs/journal.c~a
+++ a/fs/reiserfs/journal.c
@@ -2643,7 +2643,7 @@ static int journal_init_dev(struct super
if (IS_ERR(journal->j_dev_bd)) {
result = PTR_ERR(journal->j_dev_bd);
journal->j_dev_bd = NULL;
-   reiserfs_warning(super,
+   reiserfs_warning(super, "sh-457",
 "journal_init_dev: Cannot open '%s': %i",
 jdev_name, result);
return result;
_



Re: linux-next: manual merge of the net-next tree with the pci tree

2018-04-04 Thread Stephen Rothwell
Hi all,

On Tue, 3 Apr 2018 13:14:54 +1000 Stephen Rothwell  
wrote:
>
> Today's linux-next merge of the net-next tree got a conflict in:
> 
>   drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> 
> between commit:
> 
>   2907938d2375 ("net/mlx5e: Use pcie_bandwidth_available() to compute 
> bandwidth")
> 
> from the pci tree and commit:
> 
>   0608d4dbaf4e ("net/mlx5e: Unify slow PCI heuristic")
> 
> from the net-next tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> -- 
> Cheers,
> Stephen Rothwell
> 
> diff --cc drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 884337f88589,0aab3afc6885..
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@@ -3880,16 -4026,50 +4033,20 @@@ void mlx5e_build_default_indir_rqt(u32 
>   indirection_rqt[i] = i % num_channels;
>   }
>   
> - static bool cqe_compress_heuristic(u32 link_speed, u32 pci_bw)
>  -static int mlx5e_get_pci_bw(struct mlx5_core_dev *mdev, u32 *pci_bw)
>  -{
>  -enum pcie_link_width width;
>  -enum pci_bus_speed speed;
>  -int err = 0;
>  -
>  -err = pcie_get_minimum_link(mdev->pdev, &speed, &width);
>  -if (err)
>  -return err;
>  -
>  -if (speed == PCI_SPEED_UNKNOWN || width == PCIE_LNK_WIDTH_UNKNOWN)
>  -return -EINVAL;
>  -
>  -switch (speed) {
>  -case PCIE_SPEED_2_5GT:
>  -*pci_bw = 2500 * width;
>  -break;
>  -case PCIE_SPEED_5_0GT:
>  -*pci_bw = 5000 * width;
>  -break;
>  -case PCIE_SPEED_8_0GT:
>  -*pci_bw = 8000 * width;
>  -break;
>  -default:
>  -return -EINVAL;
>  -}
>  -
>  -return 0;
>  -}
>  -
> + static bool slow_pci_heuristic(struct mlx5_core_dev *mdev)
>   {
> - return (link_speed && pci_bw &&
> - (pci_bw < 4) && (pci_bw < link_speed));
> - }
> + u32 link_speed = 0;
> + u32 pci_bw = 0;
>   
> - static bool hw_lro_heuristic(u32 link_speed, u32 pci_bw)
> - {
> - return !(link_speed && pci_bw &&
> -  (pci_bw <= 16000) && (pci_bw < link_speed));
> + mlx5e_get_max_linkspeed(mdev, &link_speed);
>  -mlx5e_get_pci_bw(mdev, &pci_bw);
> ++pci_bw = pcie_bandwidth_available(mdev->pdev, NULL, NULL, NULL);
> + mlx5_core_dbg_once(mdev, "Max link speed = %d, PCI BW = %d\n",
> +link_speed, pci_bw);
> + 
> + #define MLX5E_SLOW_PCI_RATIO (2)
> + 
> + return link_speed && pci_bw &&
> + link_speed > MLX5E_SLOW_PCI_RATIO * pci_bw;
>   }
>   
>   void mlx5e_set_tx_cq_mode_params(struct mlx5e_params *params, u8 
> cq_period_mode)

This is now a conflict between the pci tree and Linus' tree.

-- 
Cheers,
Stephen Rothwell


pgpraz7Cqf_sV.pgp
Description: OpenPGP digital signature


Re: [PATCH] soc: bcm: raspberrypi-power: Fix use of __packed

2018-04-04 Thread Sasha Levin
Hi Florian Fainelli.

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag.
fixing commit: a09cd356586d ARM: bcm2835: add rpi power domain driver.

The bot has also determined it's probably a bug fixing patch. (score: 35.5765)

The bot has tested the following trees: v4.15.15, v4.14.32, v4.9.92, 

v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!

--
Thanks.
Sasha

Re: [PATCH] resource: Fix integer overflow at reallocation

2018-04-04 Thread Sasha Levin
Hi Takashi Iwai.

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag.
fixing commit: 23c570a67448 resource: ability to resize an allocated resource.

The bot has also determined it's probably a bug fixing patch. (score: 99.2157)

The bot has tested the following trees: v4.15.15, v4.14.32, v4.9.92, v4.4.126, 

v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Build OK!

--
Thanks.
Sasha

Re: [PATCH net] netns: filter uevents correctly

2018-04-04 Thread Christian Brauner
On Wed, Apr 04, 2018 at 05:38:02PM -0500, Eric W. Biederman wrote:
> Christian Brauner  writes:
> 
> > On Wed, Apr 04, 2018 at 09:48:57PM +0200, Christian Brauner wrote:
> >> commit 07e98962fa77 ("kobject: Send hotplug events in all network 
> >> namespaces")
> >> 
> >> enabled sending hotplug events into all network namespaces back in 2010.
> >> Over time the set of uevents that get sent into all network namespaces has
> >> shrunk. We have now reached the point where hotplug events for all devices
> >> that carry a namespace tag are filtered according to that namespace.
> >> 
> >> Specifically, they are filtered whenever the namespace tag of the kobject
> >> does not match the namespace tag of the netlink socket. One example are
> >> network devices. Uevents for network devices only show up in the network
> >> namespaces these devices are moved to or created in.
> >> 
> >> However, any uevent for a kobject that does not have a namespace tag
> >> associated with it will not be filtered and we will *try* to broadcast it
> >> into all network namespaces.
> >> 
> >> The original patchset was written in 2010 before user namespaces were a
> >> thing. With the introduction of user namespaces sending out uevents became
> >> partially isolated as they were filtered by user namespaces:
> >> 
> >> net/netlink/af_netlink.c:do_one_broadcast()
> >> 
> >> if (!net_eq(sock_net(sk), p->net)) {
> >> if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
> >> return;
> >> 
> >> if (!peernet_has_id(sock_net(sk), p->net))
> >> return;
> >> 
> >> if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
> >>  CAP_NET_BROADCAST))
> >> j   return;
> >> }
> >> 
> >> The file_ns_capable() check will check whether the caller had
> >> CAP_NET_BROADCAST at the time of opening the netlink socket in the user
> >> namespace of interest. This check is fine in general but seems insufficient
> >> to me when paired with uevents. The reason is that devices always belong to
> >> the initial user namespace so uevents for kobjects that do not carry a
> >> namespace tag should never be sent into another user namespace. This has
> >> been the intention all along. But there's one case where this breaks,
> >> namely if a new user namespace is created by root on the host and an
> >> identity mapping is established between root on the host and root in the
> >> new user namespace. Here's a reproducer:
> >> 
> >>  sudo unshare -U --map-root
> >>  udevadm monitor -k
> >>  # Now change to initial user namespace and e.g. do
> >>  modprobe kvm
> >>  # or
> >>  rmmod kvm
> >> 
> >> will allow the non-initial user namespace to retrieve all uevents from the
> >> host. This seems very anecdotal given that in the general case user
> >> namespaces do not see any uevents and also can't really do anything useful
> >> with them.
> >> 
> >> Additionally, it is now possible to send uevents from userspace. As such we
> >> can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
> >> namespace of the network namespace of the netlink socket) userspace process
> >> make a decision what uevents should be sent.
> >> 
> >> This makes me think that we should simply ensure that uevents for kobjects
> >> that do not carry a namespace tag are *always* filtered by user namespace
> >> in kobj_bcast_filter(). Specifically:
> >> - If the owning user namespace of the uevent socket is not init_user_ns the
> >>   event will always be filtered.
> >> - If the network namespace the uevent socket belongs to was created in the
> >>   initial user namespace but was opened from a non-initial user namespace
> >>   the event will be filtered as well.
> >> Put another way, uevents for kobjects not carrying a namespace tag are now
> >> always only sent to the initial user namespace. The regression potential
> >> for this is near to non-existent since user namespaces can't really do
> >> anything with interesting devices.
> >> 
> >> Signed-off-by: Christian Brauner 
> >
> > That was supposed to be [PATCH net] not [PATCH net-next] which is
> > obviously closed. Sorry about that.
> 
> This does not appear to be a fix.
> This looks like feature work.
> The motivation appears to be that looks wrong let's change it.

Hm, it looked like an oversight an therefore seems like a bug which is
why I thought would be a good candidate for net. Recent patches to the
semantics here just make it more obvious and provide a better argument
to fix it in the current release rather then defer it to the next one.
But I'm happy to leave this for net-next. I don't want to rush things if
this change in semantics is not trivial enough. For the record, I'm
merely fixing/expanding on the current status quo.

David, is it ok to queue this or would you prefer I resend when net-next
reopens?

> 
> So let's please leave this for when net-next opens again so we can
> have time to fully consider a change in semantics.

Sure, if we a

Re: [PATCH] drm/vc4: Fix memory leak during BO teardown

2018-04-04 Thread Sasha Levin
Hi Daniel J Blueman.

[This is an automated email]

This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all

The bot has also determined it's probably a bug fixing patch. (score: 85.0720)

The bot has tested the following trees: v4.15.15, v4.14.32, v4.9.92, v4.4.126, 

v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Failed to apply! Possible dependencies:
463873d57014: ("drm/vc4: Add an API for creating GPU shaders in GEM BOs.")
c826a6e10644: ("drm/vc4: Add a BO cache.")
d5bc60f6ad05: ("drm/vc4: Add create and map BO ioctls.")
c826a6e10644: ("drm/vc4: Add a BO cache.")


Please let us know if you'd like to have this patch included in a stable tree.

--
Thanks.
Sasha

Re: [PATCH] Input: synaptics-rmi4 - Fix an unchecked out of memory error path

2018-04-04 Thread Sasha Levin
Hi Christophe JAILLET.

[This is an automated email]

This commit has been processed by the -stable helper bot and determined
to be a high probability candidate for -stable trees. (score: 7.5278)

The bot has tested the following trees: v4.15.15, v4.14.32, v4.9.92, v4.4.126, 

v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Failed to apply! Possible dependencies:
8d99758dee31: ("Input: synaptics-rmi4 - add SPI transport driver")
ff8f83708b3e: ("Input: synaptics-rmi4 - add support for 2D sensors and F11")
fdf51604f104: ("Input: synaptics-rmi4 - add I2C transport driver")
2b6a321da9a2: ("Input: synaptics-rmi4 - add support for Synaptics RMI4 
devices")
2b6a321da9a2: ("Input: synaptics-rmi4 - add support for Synaptics RMI4 
devices")
fdf51604f104: ("Input: synaptics-rmi4 - add I2C transport driver")
2b6a321da9a2: ("Input: synaptics-rmi4 - add support for Synaptics RMI4 
devices")
2b6a321da9a2: ("Input: synaptics-rmi4 - add support for Synaptics RMI4 
devices")


Please let us know if you'd like to have this patch included in a stable tree.

--
Thanks.
Sasha

Re: [PATCH 02/45] Fix exception_enter() return value

2018-04-04 Thread Sasha Levin
Hi David Howells.

[This is an automated email]

This commit has been processed by the -stable helper bot and determined
to be a high probability candidate for -stable trees. (score: 5.5190)

The bot has tested the following trees: v4.15.15, v4.14.32, v4.9.92, v4.4.126, 

v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Failed to apply! Possible dependencies:
2e9d1e150abf: ("x86/entry: Avoid interrupt flag save and restore")


Please let us know if you'd like to have this patch included in a stable tree.

--
Thanks.
Sasha

Re: [GIT PULL] Kernel lockdown for secure boot

2018-04-04 Thread Peter Dolding
On Thu, Apr 5, 2018 at 2:26 AM, Matthew Garrett  wrote:
> On Tue, Apr 3, 2018 at 11:56 PM Peter Dolding  wrote:
>> On Wed, Apr 4, 2018 at 11:13 AM, Matthew Garrett  wrote:
>
>> > There are four cases:
>> >
>> > Verified Boot off, lockdown off: Status quo in distro and mainline
> kernels
>> > Verified Boot off, lockdown on: Perception of security improvement
> that's
>> > trivially circumvented (and so bad)
>> > Verified Boot on, lockdown off: Perception of security improvement
> that's
>> > trivially circumvented (and so bad), status quo in mainline kernels
>> > Verified Boot on, lockdown on: Security improvement, status quo in
> distro
>> > kernels
>> >
>> > Of these four options, only two make sense. The most common
> implementation
>> > of Verified Boot on x86 platforms is UEFI Secure Boot,
>
>> Stop right there.   Verified boot does not have to be UEFI secureboot.
>>You could be using a uboot verified boot or
>> https://www.coreboot.org/git-docs/Intel/vboot.html  google vboot.
>> Neither of these provide flags to kernel to say they have been
>> performed.
>
> They can be modified to set the appropriate bit in the bootparams - the
> reason we can't do that in the UEFI case is that Linux can be built as a
> UEFI binary that the firmware execute directly, and so the firmware has no
> way to set that flag.
>
With some of your embedded hardware boot loaders you have exactly the
same problem.   Where you cannot set bootparams instead have to hard
set everything in the kernel image.  This is why there is a option to
embedded initramfs image inside kernel image because some of them will
only load 1 file.

So not using UEFI  you run into the exact same problem.   So lockdown
on or off need to be a kernel build option setting default.   This
could be 3 options Always on, Always off and "Automatic based on boot
verification system status".

https://linux.die.net/man/8/efibootmgr

Also I have a problem here in non broken UEFI implementations -@ |
--append-binary-args that is very simple set the command line passed
into UEFI binary loaded by the firmware with the Linux kernel this
comes bootparams.   Yes using --append-binary-args can be a pain it is
used to tell the Linux kernel where to find the / drive.   So turning
lockdown off by bootparams is down right possible with working UEFI.
There is a lot of EFI out there that does not work properly.

>> Now Verified Boot on, lockdown off.   Insanely this can be required in
>> diagnostic on some embedded platform because EFI secureboot does not
>> have a off switch.These are platforms where they don't boot if
>> they don't have a PK and KEK set installed.  Yes some of these is jtag
>> the PK and KEK set in.
>
>> The fact that this Verified Boot on, lockdown off causes trouble
>> points to a clear problem.   User owns the hardware they should have
>> the right to defeat secureboot if they wish to.
>
> Which is why Shim allows you to disable validation if you prove physical
> user presence.

Good idea until you have a motherboard where the PS2 ports have failed
and does not support usb keyboard so you have no keyboard until after
the kernel has booted so no way to prove physical presence.   Or are
working on something embedded that has no physical user presence
interface in the boot stages these embedded devices can also be UEFI
with secureboot.  Not everything running UEFI has keyboard,
screenanything that you can prove physical user presence with
sometimes you have to pure depend on the signing key.

If I am a person who has made my own PK and has my own KEK in UEFI
system I should have the right to sign kernel with lockdown off by
default.   I may need this for diagnostics on hardware without user
interface and I may need this because the hardware is broken and I
have set PK and KEK set by direct firmware  flash access possibly by
jtag or possibly before critical port on motherboard died.

Of course I am not saying that Microsoft and others cannot have rules
that say if using their KEK that you cannot do this.   But if the
machine is my hardware and I have set my own PK and KEK set I do know
what I am doing and I should be allowed to compromise security if I
wish its my hardware.   I should not have to custom hack to do it.
Of course I am not saying that the setting in Linux kernel
configuration system cannot have a big warning that you should not do
this unless you have no other valid option and I am not saying that
the kernel should not log/report if it see what appears to be a
questionable configuration like dmesg "SECURITY ISSUE:  UEFI
secureboot detected enabled kernel built with lockdown disabled system
at risk of comprise".  Something audit tools could check logs for. .

So a kernel booting secureboot with lockdown disabled in kernel
configuration is perfectly fine to log a message that this is the
case.   Always forcing lockdown on because you see UEFI secureboot
will cause issues.

Broken hardware to get around a failed motherboard with UEFI
secur

Re: [PATCH] ARM64: dts: meson-axg: enable the eMMC controller

2018-04-04 Thread Yixun Lan
Hi Kevin

On 04/04/2018 02:26 AM, kbuild test robot wrote:
> Hi Nan,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on next-20180403]
> [cannot apply to robh/for-next v4.16 v4.16-rc7 v4.16-rc6 v4.16]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Yixun-Lan/ARM64-dts-meson-axg-enable-the-eMMC-controller/20180403-224314
> config: arm64-allyesconfig (attached as .config)
> compiler: aarch64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
> reproduce:
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> make.cross ARCH=arm64 
> 
> All errors (new ones prefixed by >>):
> 
>>> Error: arch/arm64/boot/dts/amlogic/meson-axg-s400.dts:49.24-25 syntax error
>FATAL ERROR: Unable to parse input tree
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation
> 

oops, need to add this header (I fail to do a last check when rebase to
the dt64 branch)

+#include 

will send a patch v2

Yixun



[PATCH?] reiserfs: prevent panic: don't allow %-char in journal dev. name

2018-04-04 Thread Randy Dunlap
From: Randy Dunlap 

If the reiserfs mount option's journal name contains a '%' character,
it can lead to a WARN_ONCE() in lib/vsprintf.c::format_decode(),
saying: "Please remove unsupported %/ in format string."
That's OK until panic_on_warn is set, at which point it's dead, Jim.

To placate this situation, check the journal name string for a '%'
character and return an error if one is found. Also print a warning
(one that won't panic the kernel) about the invalid journal name (e.g.):

  reiserfs: journal device name is invalid: %/file0

(In this example, the caller app specified the journal device name as
"%/file0".)

Fixes: 
https://syzkaller.appspot.com/bug?id=0627d4551fdc39bf1ef5d82cd9eef587047f7718

Reported-by: syzbot+6bd77b88c1977c03f...@syzkaller.appspotmail.com
Signed-off-by: Randy Dunlap 
Cc: sta...@vger.kernel.org # many kernel versions
Cc: reiserfs-de...@vger.kernel.org
Cc: Alexander Viro 
Cc: Jeff Mahoney 
Cc: Jan Kara 
Cc: Frederic Weisbecker 
Cc: Artem Bityutskiy 
Cc: Andrew Morton 
---
 fs/reiserfs/super.c |   11 +++
 1 file changed, 11 insertions(+)

--- lnx-416.orig/fs/reiserfs/super.c
+++ lnx-416/fs/reiserfs/super.c
@@ -1239,6 +1239,8 @@ static int reiserfs_parse_options(struct
}
 
if (c == 'j') {
+   char *badfmt;   // jdev_name (arg) cannot contain '%'
+
if (arg && *arg && jdev_name) {
/* Hm, already assigned? */
if (*jdev_name) {
@@ -1248,6 +1250,15 @@ static int reiserfs_parse_options(struct
 "be %s", *jdev_name);
return 0;
}
+
+   badfmt = strchr(arg, '%');
+   if (badfmt) {
+   printk(KERN_WARNING "reiserfs: "
+"journal device name "
+"is invalid: %s",
+arg);
+   return 0;
+   }
*jdev_name = arg;
}
}



Re: [PATCH] uts_namespace: Move boot_id in uts namespace

2018-04-04 Thread Marian Marinov
On 04/05/2018 03:35 AM, Eric W. Biederman wrote:
> Marian Marinov  writes:
> 
>> On 04/04/2018 07:02 PM, Eric W. Biederman wrote:
>>> Angel Shtilianov  writes:
>>>
 Currently the same boot_id is reported for all containers running
 on a host node, including the host node itself. Even after restarting
 a container it will still have the same persistent boot_id.

 This can cause troubles in cases where you have multiple containers
 from the same cluster on one host node. The software inside each
 container will get the same boot_id and thus fail to join the cluster,
 after the first container from the node has already joined.

 UTS namespace on other hand keeps the machine specific data, so it
 seems to be the correct place to move the boot_id and instantiate it,
 so each container will have unique id for its own boot lifetime, if
 it has its own uts namespace.
>>>
>>> Technically this really needs to use the sysctl infrastructure that
>>> allows you to register different files in different namespaces.  That
>>> way the value you read from proc_do_uuid will be based on who opens the
>>> file not on who is reading the file.
>>
>> Ok, so would you accept a patch that reimplements boot_id trough the sysctl 
>> infrastructure?
> 
> Assuming I am convinced this makes sense to do on the semantic level.
> 
>>> Practically why does a bind mount on top of boot_id work?  What makes
>>> this a general problem worth solving in the kernel?  Why is hiding the
>>> fact that you are running the same instance of the same kernel a useful
>>> thing? That is the reality.
>>
>> The problem is, that the distros do not know that they are in
>> container and don't know that they have to bind mount something on top
>> of boot_id.  You need to tell Docker, LXC/LXD and all other container
>> runtimes that they need to do this bind mount for boot_id.
> 
> Yes.  Anything like this is the responsibility of the container runtime
> one way or another.   Magic to get around fixing the small set of
> container runtimes you care about is a questionable activity.
> 
>> I consider this to be a general issue, that lacks good general
>> solution in userspace.  The kernel is providing this boot_id
>> interface, but it is giving wrong data in the context of containers.
> 
> I disagree.  Namespaces have never been about hiding that you are on a
> given machine or a single machine.  They are about encapsulating global
> identifers so that process migration can happen, and so that processes
> can be better isolated.  The boot_id is not an identify of an object in
> the kernel at all, and it is designed to be trully globally unique
> across time and space so I am not at all certain that it makes the least
> bit of sense to do anything with a boot_id.
> 
> 
> That said my memory of boot_id is that was added so that emacs (and
> related programs) could create lock files on nfs and could tell if the
> current machine owns the file, and if so be able to tell if the owner
> of the lock file is alive.
> 
> So there is an argument to be made that boot_id is to coarse.  That
> argument suggest that boot_id is a pid_namespace property.
> 
> I have not looked at the users of boot_id, and I don't have a definition
> of boot_id that makes me think it is too coarse.
> 
> If you can provide a clear description of what the semantics are and
> what they should be for boot_id showing how boot_id fits into a
> namespace, making it clear what should happen with checkpoint/restart.
> We can definitely look at changing how the kernel supports boot_id.
> 
> The reason I suggested the bind mount is there are lots of cases where
> people want to lie to applications about the reality of what is going on
> for whatever reason, and we leave those lies to userspace.  Things
> like changing the contents of /proc/cpuinfo.

Even thou previously we have sent patches that are supposed to lie to the 
software running inside a container, this one is not of this sort.

You are pointing to Emacs and I'm talking about Consul https://www.consul.io/
Systemd is also using boot_id. "journalctl --list-boots" actually uses the 
boot_id to distinguish between different boots.

  -2 caf0524a1d394ce0bdbcff75b9fe Tue 2015-02-03 21:48:52 UTC—Tue 
2015-02-03 22:17:00 UTC
  -1 13883d180dc0420db0abcb5fa26d6198 Tue 2015-02-03 22:17:03 UTC—Tue 
2015-02-03 22:19:08 UTC
   0 bed718b17a73415fade0e4e7f4bea609 Tue 2015-02-03 22:19:12 UTC—Tue 
2015-02-03 23:01:01 UTC

There may be a lot of other software, that we don't know about, that is using 
boot_id as a good mechanism to distinguish uniqueness of machines.

In case of consul, it uses the boot_id as a unique ID for the node.
On our systems, we have multiple containers residing on the same host node, 
that may be part of the same Consul setup.
One option would be to tell each customer that installs consul, to setup bind 
mounts for the boot_id. However for the customer, finding out that boot_id is 
the 

Re: [PATCH v30 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-04-04 Thread Michael S. Tsirkin
On Thu, Apr 05, 2018 at 12:30:27AM +, Wang, Wei W wrote:
> On Wednesday, April 4, 2018 10:08 PM, Michael S. Tsirkin wrote:
> > On Wed, Apr 04, 2018 at 10:07:51AM +0800, Wei Wang wrote:
> > > On 04/04/2018 02:47 AM, Michael S. Tsirkin wrote:
> > > > On Wed, Apr 04, 2018 at 12:10:03AM +0800, Wei Wang wrote:
> > > > > +static int add_one_sg(struct virtqueue *vq, unsigned long pfn,
> > > > > +uint32_t len) {
> > > > > + struct scatterlist sg;
> > > > > + unsigned int unused;
> > > > > +
> > > > > + sg_init_table(&sg, 1);
> > > > > + sg_set_page(&sg, pfn_to_page(pfn), len, 0);
> > > > > +
> > > > > + /* Detach all the used buffers from the vq */
> > > > > + while (virtqueue_get_buf(vq, &unused))
> > > > > + ;
> > > > > +
> > > > > + /*
> > > > > +  * Since this is an optimization feature, losing a couple of 
> > > > > free
> > > > > +  * pages to report isn't important. We simply return without 
> > > > > adding
> > > > > +  * the page hint if the vq is full.
> > > > why not stop scanning of following pages though?
> > >
> > > Because continuing to send hints is a way to deliver the maximum
> > > possible hints to host. For example, host may have a delay in taking
> > > hints at some point, and then it resumes to take hints soon. If the
> > > driver does not stop when the vq is full, it will be able to put more
> > > hints to the vq once the vq has available entries to add.
> > 
> > What this appears to be is just lack of coordination between host and guest.
> > 
> > But meanwhile you are spending cycles walking the list uselessly.
> > Instead of trying nilly-willy, the standard thing to do is to wait for host 
> > to
> > consume an entry and proceed.
> > 
> > Coding it up might be tricky, so it's probably acceptable as is for now, but
> > please replace the justification about with a TODO entry that we should
> > synchronize with the host.
> 
> Thanks. I plan to add
> 
> TODO: The current implementation could be further improved by stopping the 
> reporting when the vq is full and continuing the reporting when host notifies 
> that there are available entries for the driver to add.

... that entries have been used.

> 
> > 
> > 
> > >
> > > >
> > > > > +  * We are adding one entry each time, which essentially results 
> > > > > in no
> > > > > +  * memory allocation, so the GFP_KERNEL flag below can be 
> > > > > ignored.
> > > > > +  * Host works by polling the free page vq for hints after 
> > > > > sending the
> > > > > +  * starting cmd id, so the driver doesn't need to kick after 
> > > > > filling
> > > > > +  * the vq.
> > > > > +  * Lastly, there is always one entry reserved for the cmd id to 
> > > > > use.
> > > > > +  */
> > > > > + if (vq->num_free > 1)
> > > > > + return virtqueue_add_inbuf(vq, &sg, 1, vq, GFP_KERNEL);
> > > > > +
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > +static int virtio_balloon_send_free_pages(void *opaque, unsigned long
> > pfn,
> > > > > +unsigned long nr_pages)
> > > > > +{
> > > > > + struct virtio_balloon *vb = (struct virtio_balloon *)opaque;
> > > > > + uint32_t len = nr_pages << PAGE_SHIFT;
> > > > > +
> > > > > + /*
> > > > > +  * If a stop id or a new cmd id was just received from host, 
> > > > > stop
> > > > > +  * the reporting, and return 1 to indicate an active stop.
> > > > > +  */
> > > > > + if (virtio32_to_cpu(vb->vdev, vb->cmd_id_use) != vb-
> > >cmd_id_received)
> > > > > + return 1;
> > 
> > functions returning int should return 0 or -errno on failure, positive 
> > return
> > code should indicate progress.
> > 
> > If you want a boolean, use bool pls.
> 
> OK. I plan to change 1  to -EBUSY to indicate the case that host actively 
> asks the driver to stop reporting (This makes the callback return value type 
> consistent with walk_free_mem_block). 
> 

something like EINTR might be a better fit.

> 
> > 
> > 
> > > > > +
> > > > this access to cmd_id_use and cmd_id_received without locks bothers
> > > > me. Pls document why it's safe.
> > >
> > > OK. Probably we could add below to the above comments:
> > >
> > > cmd_id_use and cmd_id_received don't need to be accessed under locks
> > > because the reporting does not have to stop immediately before
> > > cmd_id_received is changed (i.e. when host requests to stop). That is,
> > > reporting more hints after host requests to stop isn't an issue for
> > > this optimization feature, because host will simply drop the stale
> > > hints next time when it needs a new reporting.
> > 
> > What about the other direction? Can this observe a stale value and exit
> > erroneously?
> 
> I'm afraid the driver couldn't be aware if the added hints are stale or not,


No - I mean that driver has code that compares two values
and stops reporting. Can one of the values be stale?

> because host and guest actions happen asynchronous

Re: [PATCH v7 2/5] of: change overlay apply input data from unflattened to FDT

2018-04-04 Thread Rob Herring
On Wed, Apr 4, 2018 at 5:35 PM, Jan Kiszka  wrote:
> Hi Frank,
>
> On 2018-03-04 01:17, frowand.l...@gmail.com wrote:
>> From: Frank Rowand 
>>
>> Move duplicating and unflattening of an overlay flattened devicetree
>> (FDT) into the overlay application code.  To accomplish this,
>> of_overlay_apply() is replaced by of_overlay_fdt_apply().
>>
>> The copy of the FDT (aka "duplicate FDT") now belongs to devicetree
>> code, which is thus responsible for freeing the duplicate FDT.  The
>> caller of of_overlay_fdt_apply() remains responsible for freeing the
>> original FDT.
>>
>> The unflattened devicetree now belongs to devicetree code, which is
>> thus responsible for freeing the unflattened devicetree.
>>
>> These ownership changes prevent early freeing of the duplicated FDT
>> or the unflattened devicetree, which could result in use after free
>> errors.
>>
>> of_overlay_fdt_apply() is a private function for the anticipated
>> overlay loader.
>
> We are using of_fdt_unflatten_tree + of_overlay_apply in the
> (out-of-tree) Jailhouse loader driver in order to register a virtual
> device during hypervisor activation with Linux. The DT overlay is
> created from a a template but modified prior to application to account
> for runtime-specific parameters. See [1] for the current implementation.
>
> I'm now wondering how to model that scenario best with the new API.
> Given that the loader lost ownership of the unflattened tree but the
> modification API exist only for the that DT state, I'm not yet seeing a
> clear solution. Should we apply the template in disabled form (status =
> "disabled"), modify it, and then activate it while it is already applied?

No. I don't think that will work.

The of_overlay_apply() function is still there, but static. We can
export it again if the need arises.

Another option is there is a notifier callback OF_OVERLAY_PRE_APPLY,
but I'm not sure we want to make that be the normal interface to make
modifications.

Rob


Re: [PATCH] mmc: dw_mmc-k3: Fix DDR52 mode by setting required clock divisor

2018-04-04 Thread Shawn Lin

[+ Zhangfei Gao who added support for hi6220]

On 2018/4/4 23:31, Ryan Grachek wrote:
On Tue, Apr 3, 2018 at 6:31 AM, Shawn Lin > wrote:


On 2018/3/30 2:24, oscardagrach wrote:

Need at least one line commit body.

Signed-off-by: oscardagrach mailto:r...@edited.us>>
---
   drivers/mmc/host/dw_mmc-k3.c | 10 --
   1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/host/dw_mmc-k3.c
b/drivers/mmc/host/dw_mmc-k3.c
index 89cdb3d533bb..efc546cb4db8 100644
--- a/drivers/mmc/host/dw_mmc-k3.c
+++ b/drivers/mmc/host/dw_mmc-k3.c
@@ -194,8 +194,14 @@ static void dw_mci_hi6220_set_ios(struct
dw_mci *host, struct mmc_ios *ios)
         int ret;
         unsigned int clock;
   -     clock = (ios->clock <= 2500) ? 2500 : ios->clock;
-
+       /* CLKDIV must be 1 for DDR52/8-bit mode */
+       if (ios->bus_width == MMC_BUS_WIDTH_8 &&
+               ios->timing == MMC_TIMING_MMC_DDR52) {
+               mci_writel(host, CLKDIV, 0x1);
+               clock = ios->clock;
+       } else {
+               clock = (ios->clock <= 2500) ? 2500 :
ios->clock;
+       }


I undertand DDR52/8-bit need CLKDIV fixed 1, but shouldn't the following
change is more sensible?

if (ios->bus_width == MMC_BUS_WIDTH_8 && ios->timing ==
MMC_TIMING_MMC_DDR52)
         clock = ios->clock * 2;
else
         clock = (ios->clock <= 2500) ? 2500 : ios->clock;


The reason is ios->clock is 52MHz and you could claim 104MHz from the
clock provider and let dw_mmc core take care of the divder to be 1.
Otherwise, you just force it to be DDR52/8-bit with a clk rate of 26MHz.


         ret = clk_set_rate(host->biu_clk, clock);
         if (ret)
                 dev_warn(host->dev, "failed to set rate
%uHz\n", clock);





For future wise, please use plain mode mail, but not HTML format.


Your feedback is correct. I see the Rockchip dwmmc driver has a similar
implementation. After applying your suggested changes, however, my board
reports "dwmmc_k3 f723d000.dwmmc0: failed to set rate 10400Hz"
during intialization of eMMC. In addition, I do not see CLKDIV being
set to 1. clk_set_rate fails and I wonder if this is out-of-scope of
the driver.

If I set CLKDIV where I did prior, with your changes, the device fails
to set the clock and falls back to 52 MHz (26 MHz) and works fine, but
again, CLKDIV is reported as 0 (even though it is 1.) One thing of
interest to note is when I manually set the clock by doing:
(echo 10400 > /sys/kernel/debug/mmc0/clock) the device reports back
'mmc_host mmc0: Bus speed (slot 0) = 19840Hz (slot req 10400Hz,
  actual 9920HZ div = 1)' which works reliably and clk_set_rate does
not report any error.



When looking closely into the code, at least dw_mci_hi6220_set_ios
goes wrong with the bus_hz, since it should be ciu_clk but not biu_clk.
"b" stands for bus, and "c" stands for card IMHO, however bus_hz
describs the clock to the card, provided by controller. Does the
following patch help?


diff --git a/drivers/mmc/host/dw_mmc-k3.c b/drivers/mmc/host/dw_mmc-k3.c
index 89cdb3d..9e78cf2 100644
--- a/drivers/mmc/host/dw_mmc-k3.c
+++ b/drivers/mmc/host/dw_mmc-k3.c
@@ -194,13 +194,21 @@ static void dw_mci_hi6220_set_ios(struct dw_mci 
*host, struct mmc_ios *ios)

int ret;
unsigned int clock;

-   clock = (ios->clock <= 2500) ? 2500 : ios->clock;
+   if (ios->bus_width == MMC_BUS_WIDTH_8 &&
+   ios->timing == MMC_TIMING_MMC_DDR52)
+   clock = ios->clock * 2;
+   else
+   clock = (ios->clock <= 2500) ? 2500 : ios->clock;

-   ret = clk_set_rate(host->biu_clk, clock);
+   ret = clk_set_rate(host->ciu_clk, clock);
if (ret)
dev_warn(host->dev, "failed to set rate %uHz\n", clock);

-   host->bus_hz = clk_get_rate(host->biu_clk);
+   clock = clk_get_rate(host->ciu_clk);
+   if (clock != host->bus_hz) {
+   host->bus_hz = clock;
+   host->current_speed = 0;
+   }
 }



I am not sure where to begin debugging these clock issues and welcome
any feedback.




Re: rcu_process_callbacks irqsoff latency caused by taking spinlock with irqs disabled

2018-04-04 Thread Nicholas Piggin
On Wed, 4 Apr 2018 17:13:58 -0700
"Paul E. McKenney"  wrote:

> On Thu, Apr 05, 2018 at 09:34:14AM +1000, Nicholas Piggin wrote:
> > Hi Paul,
> > 
> > Just looking at latencies, and RCU showed up as one of the maximums.
> > This is a 2 socket system with (176 CPU threads). Just doing a
> > `make -j 352` kernel build. Got a max latency of 3ms. I don't think
> > that's anything to worry about really, but I wanted to check the
> > cause.  
> 
> Well, that 3 milliseconds would cause serious problems for some workloads...

True.

> > # tracer: irqsoff
> > #
> > # irqsoff latency trace v1.1.5 on 4.16.0-01530-g43d1859f0994
> > # 
> > # latency: 3055 us, #19/19, CPU#135 | (M:server VP:0, KP:0, SP:0 HP:0 
> > #P:176)
> > #-
> > #| task: cc1-58689 (uid:1003 nice:0 policy:0 rt_prio:0)
> > #-
> > #  => started at: rcu_process_callbacks
> > #  => ended at:   _raw_spin_unlock_irqrestore
> > #
> > #
> > #  _--=> CPU#
> > # / _-=> irqs-off
> > #| / _=> need-resched
> > #|| / _---=> hardirq/softirq 
> > #||| / _--=> preempt-depth   
> > # / delay
> > #  cmd pid   | time  |   caller  
> > # \   /  |  \|   / 
> ><...>-58689 135d.s.0us : rcu_process_callbacks
> ><...>-58689 135d.s.1us : cpu_needs_another_gp <-rcu_process_callbacks
> ><...>-58689 135d.s.2us : rcu_segcblist_future_gp_needed 
> > <-cpu_needs_another_gp
> ><...>-58689 135d.s.3us#: _raw_spin_lock <-rcu_process_callbacks
> ><...>-58689 135d.s. 3047us : rcu_start_gp <-rcu_process_callbacks
> ><...>-58689 135d.s. 3048us : rcu_advance_cbs <-rcu_start_gp
> ><...>-58689 135d.s. 3049us : rcu_segcblist_pend_cbs <-rcu_advance_cbs
> ><...>-58689 135d.s. 3049us : rcu_segcblist_advance <-rcu_advance_cbs
> ><...>-58689 135d.s. 3050us : rcu_accelerate_cbs <-rcu_start_gp
> ><...>-58689 135d.s. 3050us : rcu_segcblist_pend_cbs <-rcu_accelerate_cbs
> ><...>-58689 135d.s. 3051us : rcu_segcblist_accelerate 
> > <-rcu_accelerate_cbs
> ><...>-58689 135d.s. 3052us : trace_rcu_future_gp.isra.0 
> > <-rcu_accelerate_cbs
> ><...>-58689 135d.s. 3052us : trace_rcu_future_gp.isra.0 
> > <-rcu_accelerate_cbs
> ><...>-58689 135d.s. 3053us : rcu_start_gp_advanced.isra.35 <-rcu_start_gp
> ><...>-58689 135d.s. 3053us : cpu_needs_another_gp 
> > <-rcu_start_gp_advanced.isra.35
> ><...>-58689 135d.s. 3054us : _raw_spin_unlock_irqrestore 
> > <-rcu_process_callbacks
> ><...>-58689 135d.s. 3055us : _raw_spin_unlock_irqrestore
> ><...>-58689 135d.s. 3056us : trace_hardirqs_on 
> > <-_raw_spin_unlock_irqrestore
> ><...>-58689 135d.s. 3061us : 
> > 
> > So it's taking a contende lock with interrupts disabled:
> > 
> > static void
> > __rcu_process_callbacks(struct rcu_state *rsp)
> > {
> > unsigned long flags;
> > bool needwake;
> > struct rcu_data *rdp = raw_cpu_ptr(rsp->rda);
> > 
> > WARN_ON_ONCE(!rdp->beenonline);
> > 
> > /* Update RCU state based on any recent quiescent states. */
> > rcu_check_quiescent_state(rsp, rdp);
> > 
> > /* Does this CPU require a not-yet-started grace period? */
> > local_irq_save(flags);
> > if (cpu_needs_another_gp(rsp, rdp)) {
> > raw_spin_lock_rcu_node(rcu_get_root(rsp)); /* irqs 
> > disabled. */
> > needwake = rcu_start_gp(rsp);
> > raw_spin_unlock_irqrestore_rcu_node(rcu_get_root(rsp), 
> > flags);
> > if (needwake)
> > rcu_gp_kthread_wake(rsp);
> > } else {
> > local_irq_restore(flags);
> > }
> > 
> > Because irqs are disabled before taking the lock, we can't spin with
> > interrupts enabled.
> > 
> > cpu_needs_another_gp needs interrupts off to prevent races with normal
> > callback registry, but that doesn't seem to be preventing any wider
> > races in this code, because we immediately re-enable interrupts anyway
> > if no gp is needed. So an interrupt can come in right after that and
> > queue something up.
> > 
> > So then the question is whether it's safe-albeit-racy to call with
> > interrupts ensabled? Would be nice to move it to a spin_lock_irqsave.  
> 
> If I recall correctly, the issue is that an unsynchronized (due to
> interrupts being enabled) check in the "if" statement can cause extra
> unneeded grace periods.

If the check is relatively cheap, perhaps you could do a second race
free verification after taking the lock and disabling interrupts?

> I am guessing that the long latency is caused by lots of CPUs suddenly
> needing a new grace period at about the same time.  If so, this is
> a bottleneck that I have been expecting for some time, and one that
> I would resolve by in

Re: 4.15.14 crash with iscsi target and dvd

2018-04-04 Thread Wakko Warner
Bart Van Assche wrote:
> On Sun, 2018-04-01 at 14:27 -0400, Wakko Warner wrote:
> > Wakko Warner wrote:
> > > Wakko Warner wrote:
> > > > I tested 4.14.32 last night with the same oops.  4.9.91 works fine.
> > > > From the initiator, if I do cat /dev/sr1 > /dev/null it works.  If I 
> > > > mount
> > > > /dev/sr1 and then do find -type f | xargs cat > /dev/null the target
> > > > crashes.  I'm using the builtin iscsi target with pscsi.  I can burn 
> > > > from
> > > > the initiator with out problems.  I'll test other kernels between 4.9 
> > > > and
> > > > 4.14.
> > > 
> > > So I've tested 4.x.y where x one of 10 11 12 14 15 and y is the latest 
> > > patch
> > > (except for 4.15 which was 1 behind)
> > > Each of these kernels crash within seconds or immediate of doing find 
> > > -type
> > > f | xargs cat > /dev/null from the initiator.
> > 
> > I tried 4.10.0.  It doesn't completely lockup the system, but the device
> > that was used hangs.  So from the initiator, it's /dev/sr1 and from the
> > target it's /dev/sr0.  Attempting to read /dev/sr0 after the oops causes the
> > process to hang in D state.
> 
> Hello Wakko,
> 
> Thank you for having narrowed down this further. I think that you encountered
> a regression either in the block layer core or in the SCSI core. Unfortunately
> the number of changes between kernel versions v4.9 and v4.10 in these two
> subsystems is huge. I see two possible ways forward:
> - Either that you perform a bisect to identify the patch that introduced this
>   regression. However, I'm not sure whether you are familiar with the bisect
>   process.
> - Or that you identify the command that triggers this crash such that others
>   can reproduce this issue without needing access to your setup.
> 
> How about reproducing this crash with the below patch applied on top of
> kernel v4.15.x? The additional output sent by this patch to the system log
> should allow us to reproduce this issue by submitting the same SCSI command
> with sg_raw.

Sorry for not getting back in touch.  My internet was down.  I haven't tried
the patch yet.  I'll try to get to that tomorrow.  The system with the issue
is busy and I can't reboot it right now.


Re: linux-next: manual merge of the sparc-next tree with the arm64 tree

2018-04-04 Thread Stephen Rothwell
Hi all,

On Wed, 21 Mar 2018 10:34:08 +1100 Stephen Rothwell  
wrote:
>
> Today's linux-next merge of the sparc-next tree got a conflict in:
> 
>   arch/x86/kernel/signal_compat.c
> 
> between commit:
> 
>   266da65e9156 ("signal: Add FPE_FLTUNK si_code for undiagnosable fp 
> exceptions")
> 
> from the arm64 tree and commit:
> 
>   d84bb709aa4a ("signals, sparc: Add signal codes for ADI violations")
> 
> from the sparc-next tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> -- 
> Cheers,
> Stephen Rothwell
> 
> diff --cc arch/x86/kernel/signal_compat.c
> index d2884e951bb5,838a4dc90a6d..
> --- a/arch/x86/kernel/signal_compat.c
> +++ b/arch/x86/kernel/signal_compat.c
> @@@ -26,8 -26,8 +26,8 @@@ static inline void signal_compat_build_
>* new fields are handled in copy_siginfo_to_user32()!
>*/
>   BUILD_BUG_ON(NSIGILL  != 11);
>  -BUILD_BUG_ON(NSIGFPE  != 13);
>  +BUILD_BUG_ON(NSIGFPE  != 14);
> - BUILD_BUG_ON(NSIGSEGV != 4);
> + BUILD_BUG_ON(NSIGSEGV != 7);
>   BUILD_BUG_ON(NSIGBUS  != 5);
>   BUILD_BUG_ON(NSIGTRAP != 4);
>   BUILD_BUG_ON(NSIGCHLD != 6);

This is now a conflict between the adm64 tree and Linus' tree.

-- 
Cheers,
Stephen Rothwell


pgpEYQ55IBpVY.pgp
Description: OpenPGP digital signature


Re: [PATCH] uts_namespace: Move boot_id in uts namespace

2018-04-04 Thread Eric W. Biederman
Marian Marinov  writes:

> On 04/04/2018 07:02 PM, Eric W. Biederman wrote:
>> Angel Shtilianov  writes:
>> 
>>> Currently the same boot_id is reported for all containers running
>>> on a host node, including the host node itself. Even after restarting
>>> a container it will still have the same persistent boot_id.
>>>
>>> This can cause troubles in cases where you have multiple containers
>>> from the same cluster on one host node. The software inside each
>>> container will get the same boot_id and thus fail to join the cluster,
>>> after the first container from the node has already joined.
>>>
>>> UTS namespace on other hand keeps the machine specific data, so it
>>> seems to be the correct place to move the boot_id and instantiate it,
>>> so each container will have unique id for its own boot lifetime, if
>>> it has its own uts namespace.
>> 
>> Technically this really needs to use the sysctl infrastructure that
>> allows you to register different files in different namespaces.  That
>> way the value you read from proc_do_uuid will be based on who opens the
>> file not on who is reading the file.
>
> Ok, so would you accept a patch that reimplements boot_id trough the sysctl 
> infrastructure?

Assuming I am convinced this makes sense to do on the semantic level.

>> Practically why does a bind mount on top of boot_id work?  What makes
>> this a general problem worth solving in the kernel?  Why is hiding the
>> fact that you are running the same instance of the same kernel a useful
>> thing? That is the reality.
>
> The problem is, that the distros do not know that they are in
> container and don't know that they have to bind mount something on top
> of boot_id.  You need to tell Docker, LXC/LXD and all other container
> runtimes that they need to do this bind mount for boot_id.

Yes.  Anything like this is the responsibility of the container runtime
one way or another.   Magic to get around fixing the small set of
container runtimes you care about is a questionable activity.

> I consider this to be a general issue, that lacks good general
> solution in userspace.  The kernel is providing this boot_id
> interface, but it is giving wrong data in the context of containers.

I disagree.  Namespaces have never been about hiding that you are on a
given machine or a single machine.  They are about encapsulating global
identifers so that process migration can happen, and so that processes
can be better isolated.  The boot_id is not an identify of an object in
the kernel at all, and it is designed to be trully globally unique
across time and space so I am not at all certain that it makes the least
bit of sense to do anything with a boot_id.


That said my memory of boot_id is that was added so that emacs (and
related programs) could create lock files on nfs and could tell if the
current machine owns the file, and if so be able to tell if the owner
of the lock file is alive.

So there is an argument to be made that boot_id is to coarse.  That
argument suggest that boot_id is a pid_namespace property.

I have not looked at the users of boot_id, and I don't have a definition
of boot_id that makes me think it is too coarse.

If you can provide a clear description of what the semantics are and
what they should be for boot_id showing how boot_id fits into a
namespace, making it clear what should happen with checkpoint/restart.
We can definitely look at changing how the kernel supports boot_id.

The reason I suggested the bind mount is there are lots of cases where
people want to lie to applications about the reality of what is going on
for whatever reason, and we leave those lies to userspace.  Things
like changing the contents of /proc/cpuinfo.

> Proposing to fix this problem in userspace seams like ignoring the
> issue.  You could have said to the Consul guys, that they should
> simply stop using boot_id, because it doesn't work correctly on
> containers.

I don't know the Consul guys.   From a quick google search I see that
Consul is an open source project that is aims to be distributed and
highly available.  It seems a reasonable case to look at to motivate
changes to boot_id.

That said if I want to be highly available I would find every node
having the same boot_id to be very worrying, and very useful.  It allows
detecting if no hardware redundancy is present in a situation.  That
certainly seems like a good thing.

If you just want to test Consul then hacking boot_id with a bind mount
seems the right thing.  If you really want to run Consul in production
I am curious to know how removing the ability to detect if you are on
the same kernel as another piece of Consul is a good thing.

Eric


RE: [PATCH v30 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-04-04 Thread Wang, Wei W
On Wednesday, April 4, 2018 10:08 PM, Michael S. Tsirkin wrote:
> On Wed, Apr 04, 2018 at 10:07:51AM +0800, Wei Wang wrote:
> > On 04/04/2018 02:47 AM, Michael S. Tsirkin wrote:
> > > On Wed, Apr 04, 2018 at 12:10:03AM +0800, Wei Wang wrote:
> > > > +static int add_one_sg(struct virtqueue *vq, unsigned long pfn,
> > > > +uint32_t len) {
> > > > +   struct scatterlist sg;
> > > > +   unsigned int unused;
> > > > +
> > > > +   sg_init_table(&sg, 1);
> > > > +   sg_set_page(&sg, pfn_to_page(pfn), len, 0);
> > > > +
> > > > +   /* Detach all the used buffers from the vq */
> > > > +   while (virtqueue_get_buf(vq, &unused))
> > > > +   ;
> > > > +
> > > > +   /*
> > > > +* Since this is an optimization feature, losing a couple of 
> > > > free
> > > > +* pages to report isn't important. We simply return without 
> > > > adding
> > > > +* the page hint if the vq is full.
> > > why not stop scanning of following pages though?
> >
> > Because continuing to send hints is a way to deliver the maximum
> > possible hints to host. For example, host may have a delay in taking
> > hints at some point, and then it resumes to take hints soon. If the
> > driver does not stop when the vq is full, it will be able to put more
> > hints to the vq once the vq has available entries to add.
> 
> What this appears to be is just lack of coordination between host and guest.
> 
> But meanwhile you are spending cycles walking the list uselessly.
> Instead of trying nilly-willy, the standard thing to do is to wait for host to
> consume an entry and proceed.
> 
> Coding it up might be tricky, so it's probably acceptable as is for now, but
> please replace the justification about with a TODO entry that we should
> synchronize with the host.

Thanks. I plan to add

TODO: The current implementation could be further improved by stopping the 
reporting when the vq is full and continuing the reporting when host notifies 
that there are available entries for the driver to add.


> 
> 
> >
> > >
> > > > +* We are adding one entry each time, which essentially results 
> > > > in no
> > > > +* memory allocation, so the GFP_KERNEL flag below can be 
> > > > ignored.
> > > > +* Host works by polling the free page vq for hints after 
> > > > sending the
> > > > +* starting cmd id, so the driver doesn't need to kick after 
> > > > filling
> > > > +* the vq.
> > > > +* Lastly, there is always one entry reserved for the cmd id to 
> > > > use.
> > > > +*/
> > > > +   if (vq->num_free > 1)
> > > > +   return virtqueue_add_inbuf(vq, &sg, 1, vq, GFP_KERNEL);
> > > > +
> > > > +   return 0;
> > > > +}
> > > > +
> > > > +static int virtio_balloon_send_free_pages(void *opaque, unsigned long
> pfn,
> > > > +  unsigned long nr_pages)
> > > > +{
> > > > +   struct virtio_balloon *vb = (struct virtio_balloon *)opaque;
> > > > +   uint32_t len = nr_pages << PAGE_SHIFT;
> > > > +
> > > > +   /*
> > > > +* If a stop id or a new cmd id was just received from host, 
> > > > stop
> > > > +* the reporting, and return 1 to indicate an active stop.
> > > > +*/
> > > > +   if (virtio32_to_cpu(vb->vdev, vb->cmd_id_use) != vb-
> >cmd_id_received)
> > > > +   return 1;
> 
> functions returning int should return 0 or -errno on failure, positive return
> code should indicate progress.
> 
> If you want a boolean, use bool pls.

OK. I plan to change 1  to -EBUSY to indicate the case that host actively asks 
the driver to stop reporting (This makes the callback return value type 
consistent with walk_free_mem_block). 



> 
> 
> > > > +
> > > this access to cmd_id_use and cmd_id_received without locks bothers
> > > me. Pls document why it's safe.
> >
> > OK. Probably we could add below to the above comments:
> >
> > cmd_id_use and cmd_id_received don't need to be accessed under locks
> > because the reporting does not have to stop immediately before
> > cmd_id_received is changed (i.e. when host requests to stop). That is,
> > reporting more hints after host requests to stop isn't an issue for
> > this optimization feature, because host will simply drop the stale
> > hints next time when it needs a new reporting.
> 
> What about the other direction? Can this observe a stale value and exit
> erroneously?

I'm afraid the driver couldn't be aware if the added hints are stale or not, 
because host and guest actions happen asynchronously. That is, host side 
iothread stops taking hints as soon as the migration thread asks to stop, it 
doesn't wait for any ACK from the driver to stop (as we discussed before, host 
couldn't always assume that the driver is in a responsive state).

Btw, we also don't need to worry about any memory left in the vq, since only 
addresses are added to the vq, there is no real memory allocations.

Best,
Wei


Re: [PATCH 2/2] efi: Add embedded peripheral firmware support

2018-04-04 Thread Ard Biesheuvel
On 4 April 2018 at 22:25, Hans de Goede  wrote:
> HI,
>
>
> On 04-04-18 19:18, Peter Jones wrote:
>>
>> On Tue, Apr 03, 2018 at 06:58:48PM +, Luis R. Rodriguez wrote:
>>>
>>> On Tue, Apr 03, 2018 at 08:07:11PM +0200, Lukas Wunner wrote:

 On Tue, Apr 03, 2018 at 10:33:25AM +0200, Hans de Goede wrote:
>
> I asked Peter Jones for suggestions how to extract this during boot and
> he suggested seeing if there was a copy of the firmware in the
> EFI_BOOT_SERVICES_CODE memory segment, which it turns out there is.
>
> My patch to add support for this contains a table of device-model (dmi
> strings), firmware header (first 64 bits), length and crc32 and then if
> we boot on a device-model which is in the table the code scans the
> EFI_BOOT_SERVICES_CODE for the prefix, if found checks the crc and
> caches the firmware for later use by request-firmware.
>
> So I just do a brute-force search for the firmware, this really is
> hack,
> nothing standard about it I'm afraid. But it works on 4 different x86
> tablets I have and makes the touchscreen work OOTB on them, so I
> believe
> it is a worthwhile hack to have.


 The EFI Firmware Volume contains a kind of filesystem with files
 identified by GUIDs.  Those files include EFI drivers, ACPI tables,
 DMI data and so on.  It is actually quite common for vendors to
 also include device firmware on the Firmware Volume.  Apple is doing
 this to ship firmware updates e.g. for the GMUX controller found on
 dual GPU MacBook Pros.  If they want to update the controller's
 firmware, they include it in a BIOS update, and an EFI driver checks
 on boot if the firmware update for the controller is necessary and
 if so, flashes it.

 The firmware files you're looking for are almost certainly included
 on the Firmware Volume as individual files.
>>>
>>>
>>> What Hans implemented seems to have been for a specific x86 hack, best if
>>> we
>>> confirm if indeed they are present on the Firmware Volume.
>>
>>
>> To be honest, I'm a bit skeptical about the firmware volume approach.
>> Tools like UEFITool[0] and uefi-firmware-parser[1] have existed for
>> years, still don't seem to reliably parse firmware images I see in the
>> wild, and have a fairly regular need for fixes.  These are tools
>> maintained by smart people who are making a real effort, and it still
>> looks pretty hard to do a good job that applies across a lot of
>> platforms.
>>
>> So I'd rather use Hans's existing patches, at least for now, and if
>> someone is interested in hacking on making an efi firmware volume parser
>> for the kernel, switch them to that when such a thing is ready.
>>
>> [0] g...@github.com:LongSoft/UEFITool.git
>> [1] g...@github.com:theopolis/uefi-firmware-parser.git
>>
 Rather than scraping
 the EFI memory for firmware, I think it would be cleaner and more
 elegant if you just retrieve the files you're interested in from
 the Firmware Volume.

 We're doing something similar with Apple EFI properties, see
 58c5475aba67 and c9cc3aaa0281.

 Basically what you need to do to implement this approach is:

 * Determine the GUIDs used by vendors for the files you're interested
in.  Either dump the Firmware Volume or take an EFI update as
shipped by the vendor, then feed it to UEFIExtract:
https://github.com/LongSoft/UEFITool
* Add the EFI Firmware Volume Protocol to include/linux/efi.h:

 https://www.intel.com/content/dam/doc/reference-guide/efi-firmware-file-volume-specification.pdf

 * Amend arch/x86/boot/compressed/eboot.c to read the files with the
GUIDs you're interested in into memory and pass the files to the
kernel as setup_data payloads.

 * Once the kernel has booted, make the files you've retrieved
available to device drivers as firmware blobs.
>>>
>>>
>>> Happen to know if devices using Firmware Volumes also sign their firmware
>>> and if hw checks the firmware at load time?
>>
>>
>> It varies on a per-device basis, of course.  Most new Intel machines as
>> of Haswell *should* be verifying their system firmware via Boot Guard,
>> which both checks an RSA signature and measures the firmware into the
>> TPM, but as with everything of this nature, there are certainly vendors
>> that screw it up. (I think AMD has something similar, but I'm really not
>> sure.)
>
>
> Lukas, thank you for your suggestions on this, but I doubt that these
> devices use the Firmware Volume stuff.
>

Aren't Firmware Volumes a PI thing rather than a UEFI thing?

> These are really cheap x86 Windows 10 tablets, everything about them is
> simply hacked together by the manufacturer till it boots Windows10 and
> then it is shipped to the customer without receiving any update
> afterwards ever.
>
> What you are describing sounds like significantly more work then
> the ven

Re: An actual suggestion (Re: [GIT PULL] Kernel lockdown for secure boot)

2018-04-04 Thread Matthew Garrett
On Wed, Apr 4, 2018 at 4:25 PM James Morris  wrote:
> It's surely reasonable to allow an already secure-booted system to be
> debugged without needing to be rebooted.

alt-sysrq-x from a physical console will do that.


Re: [GIT PULL] Kernel lockdown for secure boot

2018-04-04 Thread Matthew Garrett
On Wed, Apr 4, 2018 at 5:05 PM Peter Dolding  wrote:

> > If you don't have secure boot then an attacker with root can modify your
> > bootloader or kernel, and on next boot lockdown can be silently
disabled.

> Stop being narrow minded you don't need secure boot to protect
> bootloader or kernel the classic is only boot from read only media.

And if you use another protected path you can set the appropriate bootparam
flag or pass the appropriate kernel command line argument and gain the same
functionality.


Re: [GIT PULL] x86/build changes for v4.17

2018-04-04 Thread Kees Cook
On Wed, Apr 4, 2018 at 5:05 PM, Linus Torvalds
 wrote:
> On Wed, Apr 4, 2018 at 4:31 PM, Matthias Kaehlcke  wrote:
>>
>> From some experiments it looks like clang, in difference to gcc, does
>> not treat constant values passed as parameters to inline function as
>> constants.
>
> Yeah, I think gcc used to have those semantics a long time ago too.
>
> Many of our __builtin_constant_p() uses are indeed just in macros, but
> certainly not all.
>
> Other examples are found in our "fortified" string functions.
>
> There a clang build will likely simply miss some of the build-time
> fortification checks, and trigger them at runtime instead.
>
> Of course, we hopefully don't *have* any build-time failures, because
> gcc will have caught them, so you won't care as long as clang is a
> secondary compiler, but long-term they'd be good.

Yeah, it's used in inline functions in a lot of places. Some quickly
jump out: kmalloc, crypto, bitmaps, networking, uaccess, kvm, etc from
doing a dumb grep as:

git grep -B5 __builtin_constant_p | grep -A5 inline

FWIW, I prefer inline functions over macros just to keep type checking
a some level of sanity when reading build warnings/errors. ;)

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH v5 2/3] lib: Rename compiler intrinsic selects to GENERIC_LIB_*

2018-04-04 Thread Palmer Dabbelt

On Wed, 04 Apr 2018 15:02:58 PDT (-0700), jho...@kernel.org wrote:

On Tue, Apr 03, 2018 at 03:39:34PM -0700, Palmer Dabbelt wrote:
Sorry, I'm not sure if this is the right patch -- someone suggested acking 
this, but it's already Review-By me and if I understand correctly it's going 
through your tree.  I'm a bit new to this, but if it helps then here's a


Acked-By: Palmer Dabbelt 


Thanks Palmer.

No worries. FYI Documentation/process/submitting-patches.rst appears to
now contain lots of gory detail about what Acked-by and Reviewed-by are
supposed to mean.

In this case an acked-by is needed to show your approval of the parts of
the patch which touch the subsystem you are responsible for (and/or code
which you have authored) so that the patch can go via another tree. It
usually indicates that you've reviewed those parts of the patch too, but
not necessarily the whole thing.


Thanks!


Re: rcu_process_callbacks irqsoff latency caused by taking spinlock with irqs disabled

2018-04-04 Thread Paul E. McKenney
On Thu, Apr 05, 2018 at 09:34:14AM +1000, Nicholas Piggin wrote:
> Hi Paul,
> 
> Just looking at latencies, and RCU showed up as one of the maximums.
> This is a 2 socket system with (176 CPU threads). Just doing a
> `make -j 352` kernel build. Got a max latency of 3ms. I don't think
> that's anything to worry about really, but I wanted to check the
> cause.

Well, that 3 milliseconds would cause serious problems for some workloads...

> # tracer: irqsoff
> #
> # irqsoff latency trace v1.1.5 on 4.16.0-01530-g43d1859f0994
> # 
> # latency: 3055 us, #19/19, CPU#135 | (M:server VP:0, KP:0, SP:0 HP:0 #P:176)
> #-
> #| task: cc1-58689 (uid:1003 nice:0 policy:0 rt_prio:0)
> #-
> #  => started at: rcu_process_callbacks
> #  => ended at:   _raw_spin_unlock_irqrestore
> #
> #
> #  _--=> CPU#
> # / _-=> irqs-off
> #| / _=> need-resched
> #|| / _---=> hardirq/softirq 
> #||| / _--=> preempt-depth   
> # / delay
> #  cmd pid   | time  |   caller  
> # \   /  |  \|   / 
><...>-58689 135d.s.0us : rcu_process_callbacks
><...>-58689 135d.s.1us : cpu_needs_another_gp <-rcu_process_callbacks
><...>-58689 135d.s.2us : rcu_segcblist_future_gp_needed 
> <-cpu_needs_another_gp
><...>-58689 135d.s.3us#: _raw_spin_lock <-rcu_process_callbacks
><...>-58689 135d.s. 3047us : rcu_start_gp <-rcu_process_callbacks
><...>-58689 135d.s. 3048us : rcu_advance_cbs <-rcu_start_gp
><...>-58689 135d.s. 3049us : rcu_segcblist_pend_cbs <-rcu_advance_cbs
><...>-58689 135d.s. 3049us : rcu_segcblist_advance <-rcu_advance_cbs
><...>-58689 135d.s. 3050us : rcu_accelerate_cbs <-rcu_start_gp
><...>-58689 135d.s. 3050us : rcu_segcblist_pend_cbs <-rcu_accelerate_cbs
><...>-58689 135d.s. 3051us : rcu_segcblist_accelerate <-rcu_accelerate_cbs
><...>-58689 135d.s. 3052us : trace_rcu_future_gp.isra.0 
> <-rcu_accelerate_cbs
><...>-58689 135d.s. 3052us : trace_rcu_future_gp.isra.0 
> <-rcu_accelerate_cbs
><...>-58689 135d.s. 3053us : rcu_start_gp_advanced.isra.35 <-rcu_start_gp
><...>-58689 135d.s. 3053us : cpu_needs_another_gp 
> <-rcu_start_gp_advanced.isra.35
><...>-58689 135d.s. 3054us : _raw_spin_unlock_irqrestore 
> <-rcu_process_callbacks
><...>-58689 135d.s. 3055us : _raw_spin_unlock_irqrestore
><...>-58689 135d.s. 3056us : trace_hardirqs_on 
> <-_raw_spin_unlock_irqrestore
><...>-58689 135d.s. 3061us : 
> 
> So it's taking a contende lock with interrupts disabled:
> 
> static void
> __rcu_process_callbacks(struct rcu_state *rsp)
> {
> unsigned long flags;
> bool needwake;
> struct rcu_data *rdp = raw_cpu_ptr(rsp->rda);
> 
> WARN_ON_ONCE(!rdp->beenonline);
> 
> /* Update RCU state based on any recent quiescent states. */
> rcu_check_quiescent_state(rsp, rdp);
> 
> /* Does this CPU require a not-yet-started grace period? */
> local_irq_save(flags);
> if (cpu_needs_another_gp(rsp, rdp)) {
> raw_spin_lock_rcu_node(rcu_get_root(rsp)); /* irqs disabled. 
> */
> needwake = rcu_start_gp(rsp);
> raw_spin_unlock_irqrestore_rcu_node(rcu_get_root(rsp), flags);
> if (needwake)
> rcu_gp_kthread_wake(rsp);
> } else {
> local_irq_restore(flags);
> }
> 
> Because irqs are disabled before taking the lock, we can't spin with
> interrupts enabled.
> 
> cpu_needs_another_gp needs interrupts off to prevent races with normal
> callback registry, but that doesn't seem to be preventing any wider
> races in this code, because we immediately re-enable interrupts anyway
> if no gp is needed. So an interrupt can come in right after that and
> queue something up.
> 
> So then the question is whether it's safe-albeit-racy to call with
> interrupts ensabled? Would be nice to move it to a spin_lock_irqsave.

If I recall correctly, the issue is that an unsynchronized (due to
interrupts being enabled) check in the "if" statement can cause extra
unneeded grace periods.

I am guessing that the long latency is caused by lots of CPUs suddenly
needing a new grace period at about the same time.  If so, this is
a bottleneck that I have been expecting for some time, and one that
I would resolve by introducing funnel locking, sort of like SRCU and
expedited grace periods already use.

Or am I confused about the source of the contention?

Thanx, Paul



[PATCH] brcm80211: brcmsmac: phy_lcn: remove duplicate code

2018-04-04 Thread Gustavo A. R. Silva
Remove and refactor some code in order to avoid having identical code
for different branches.

Notice that this piece of code hasn't been modified since 2011.

Addresses-Coverity-ID: 1226756 ("Identical code for different branches")
Signed-off-by: Gustavo A. R. Silva 
---
 drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_lcn.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_lcn.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_lcn.c
index 93d4cde..9d830d2 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_lcn.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_lcn.c
@@ -3388,13 +3388,8 @@ void wlc_lcnphy_deaf_mode(struct brcms_phy *pi, bool 
mode)
u8 phybw40;
phybw40 = CHSPEC_IS40(pi->radio_chanspec);
 
-   if (LCNREV_LT(pi->pubpi.phy_rev, 2)) {
-   mod_phy_reg(pi, 0x4b0, (0x1 << 5), (mode) << 5);
-   mod_phy_reg(pi, 0x4b1, (0x1 << 9), 0 << 9);
-   } else {
-   mod_phy_reg(pi, 0x4b0, (0x1 << 5), (mode) << 5);
-   mod_phy_reg(pi, 0x4b1, (0x1 << 9), 0 << 9);
-   }
+   mod_phy_reg(pi, 0x4b0, (0x1 << 5), (mode) << 5);
+   mod_phy_reg(pi, 0x4b1, (0x1 << 9), 0 << 9);
 
if (phybw40 == 0) {
mod_phy_reg((pi), 0x410,
-- 
2.7.4



[RFC 1/3] drm: Add DAMAGE_CLIPS property to plane

2018-04-04 Thread Deepak Rawat
From: Lukasz Spintzyk 

Optional plane property to mark damaged regions on the plane in
framebuffer coordinates of the framebuffer attached to the plane.

The layout of blob data is simply an array of drm_mode_rect with maximum
array size limited by DRM_MODE_FB_DIRTY_MAX_CLIPS. Unlike plane src
coordinates, damage clips are not in 16.16 fixed point.

Damage clips are a hint to kernel as which area of framebuffer has
changed since last page-flip. This should be helpful for some drivers
especially for virtual devices where each framebuffer change needs to
be transmitted over network, usb, etc.

Driver which are interested in enabling DAMAGE_CLIPS property for a
plane should enable this property using drm_plane_enable_damage_clips.

Signed-off-by: Lukasz Spintzyk 
Signed-off-by: Deepak Rawat 
---
 drivers/gpu/drm/drm_atomic.c| 42 +
 drivers/gpu/drm/drm_atomic_helper.c |  4 
 drivers/gpu/drm/drm_mode_config.c   |  5 +
 drivers/gpu/drm/drm_plane.c | 12 +++
 include/drm/drm_mode_config.h   | 15 +
 include/drm/drm_plane.h | 16 ++
 include/uapi/drm/drm_mode.h | 15 +
 7 files changed, 109 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index 7d25c42..9226d24 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -669,6 +669,40 @@ static void drm_atomic_crtc_print_state(struct drm_printer 
*p,
 }
 
 /**
+ * drm_atomic_set_damage_for_plane - sets the damage clips property to plane
+ * @state: plane state
+ * @blob: damage clips in framebuffer coordinates
+ *
+ * Returns:
+ *
+ * Zero on success, error code on failure.
+ */
+static int drm_atomic_set_damage_for_plane(struct drm_plane_state *state,
+  struct drm_property_blob *blob)
+{
+   if (blob == state->damage_clips)
+   return 0;
+
+   drm_property_blob_put(state->damage_clips);
+   state->damage_clips = NULL;
+
+   if (blob) {
+   uint32_t count = blob->length/sizeof(struct drm_rect);
+
+   if (count > DRM_MODE_FB_DIRTY_MAX_CLIPS)
+   return -EINVAL;
+
+   state->damage_clips = drm_property_blob_get(blob);
+   state->num_clips = count;
+   } else {
+   state->damage_clips = NULL;
+   state->num_clips = 0;
+   }
+
+   return 0;
+}
+
+/**
  * drm_atomic_get_plane_state - get plane state
  * @state: global atomic state object
  * @plane: plane to get state object for
@@ -793,6 +827,12 @@ static int drm_atomic_plane_set_property(struct drm_plane 
*plane,
state->color_encoding = val;
} else if (property == plane->color_range_property) {
state->color_range = val;
+   } else if (property == config->prop_damage_clips) {
+   struct drm_property_blob *blob =
+   drm_property_lookup_blob(dev, val);
+   int ret = drm_atomic_set_damage_for_plane(state, blob);
+   drm_property_blob_put(blob);
+   return ret;
} else if (plane->funcs->atomic_set_property) {
return plane->funcs->atomic_set_property(plane, state,
property, val);
@@ -856,6 +896,8 @@ drm_atomic_plane_get_property(struct drm_plane *plane,
*val = state->color_encoding;
} else if (property == plane->color_range_property) {
*val = state->color_range;
+   } else if (property == config->prop_damage_clips) {
+   *val = (state->damage_clips) ? state->damage_clips->base.id : 0;
} else if (plane->funcs->atomic_get_property) {
return plane->funcs->atomic_get_property(plane, state, 
property, val);
} else {
diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index c356545..55b44e3 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -3506,6 +3506,8 @@ void __drm_atomic_helper_plane_duplicate_state(struct 
drm_plane *plane,
 
state->fence = NULL;
state->commit = NULL;
+   state->damage_clips = NULL;
+   state->num_clips = 0;
 }
 EXPORT_SYMBOL(__drm_atomic_helper_plane_duplicate_state);
 
@@ -3550,6 +3552,8 @@ void __drm_atomic_helper_plane_destroy_state(struct 
drm_plane_state *state)
 
if (state->commit)
drm_crtc_commit_put(state->commit);
+
+   drm_property_blob_put(state->damage_clips);
 }
 EXPORT_SYMBOL(__drm_atomic_helper_plane_destroy_state);
 
diff --git a/drivers/gpu/drm/drm_mode_config.c 
b/drivers/gpu/drm/drm_mode_config.c
index e5c6533..e93b127 100644
--- a/drivers/gpu/drm/drm_mode_config.c
+++ b/drivers/gpu/drm/drm_mode_config.c
@@ -293,6 +293,11 @@ static int drm_mode_create_standard_properties(struct 
drm_device *dev)
return -ENOMEM;
dev->mode_config.prop

Re: [PATCH v2 2/2] io: prevent compiler reordering on the default readX() implementation

2018-04-04 Thread Sinan Kaya
On 4/4/2018 3:50 PM, Arnd Bergmann wrote:
> On Wed, Apr 4, 2018 at 7:48 PM, Sinan Kaya  wrote:
>> On 4/4/2018 11:55 AM, Arnd Bergmann wrote:
>>> Yes, exactly, plus the same for write and in/out of course.
>>
>> I was looking at this...
>>
>> inb() and outb() seem to be calling writeb(). It gets the wmb/barrier 
>> automatically
>> when we fix writeb().
>>
>> Did I miss something?
> 
> At least outb() needs stricter barriers than writeb() in theory, what
> we want here
> is that outb() has not just made it out to the device but that the
> write has been
> confirmed completed by the device. Some architectures can't do it, but those
> that can should have an easy way to hook into that using a separate set of
> barriers.
> 
> Using the riscv barrier names, we could do this like
> 
> #ifndef __io_bw()
> #define __io_bw()  wmb()
> #endif
> 
> #ifndef __io_aw
> #define __io_aw()  barrier()
> #endif
> 
> #ifndef __io_pbw
> #define __io_pbw() __io_bw()
> #endif
> 
> #ifndef __io_paw
> #define __io_paw() __io_aw()
> #endif
> 
> and the same thing for reads. This way, an architecture could override
> any of those, but still get reasonable defaults for the others.
> For __io_bw(), I picked barrier() instead of do {} while (0), no idea
> if that's any better, I just play safe here.

I posted V3. I hope I captured what you mean above correctly.

> 
>  Arnd
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.


[RFC 2/3] drm: Add helper iterator functions to iterate over plane damage.

2018-04-04 Thread Deepak Rawat
With damage property in drm_plane_state, this patch adds helper iterator
to traverse the damage clips. Iterator will return the damage rectangles
in framebuffer, plane or crtc coordinates as need by driver
implementation.

Signed-off-by: Deepak Rawat 
---
 drivers/gpu/drm/drm_atomic_helper.c | 122 
 include/drm/drm_atomic_helper.h |  39 
 2 files changed, 161 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index 55b44e3..355b514 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -3865,3 +3865,125 @@ void 
__drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj
memcpy(state, obj->state, sizeof(*state));
 }
 EXPORT_SYMBOL(__drm_atomic_helper_private_obj_duplicate_state);
+
+/**
+ * drm_atomic_helper_damage_iter_init - initialize the damage iterator
+ * @iter: The iterator to initialize.
+ * @type: Coordinate type caller is interested in.
+ * @state: plane_state from which to iterate the damage clips.
+ * @hdisplay: Width of crtc on which plane is scanned out.
+ * @vdisplay: Height of crtc on which plane is scanned out.
+ *
+ * Initialize an iterator that is used to translate and clip a set of damage
+ * rectangles in framebuffer coordinates to plane and crtc coordinates. The 
type
+ * argument specify which type of coordinate to iterate in.
+ *
+ * Returns: 0 on success and negative error code on error. If an error code is
+ * returned then it means the plane state should not update.
+ */
+int
+drm_atomic_helper_damage_iter_init(struct drm_atomic_helper_damage_iter *iter,
+  enum drm_atomic_helper_damage_clip_type type,
+  const struct drm_plane_state *state,
+  uint32_t hdisplay, uint32_t vdisplay)
+{
+   if (!state || !state->crtc || !state->fb)
+   return -EINVAL;
+
+   memset(iter, 0, sizeof(*iter));
+   iter->clips = (struct drm_rect *)state->damage_clips->data;
+   iter->num_clips = state->num_clips;
+   iter->type = type;
+
+   /*
+* Full update in case of scaling or rotation. In future support for
+* scaling/rotating damage clips can be added
+*/
+   if (state->crtc_w != (state->src_w >> 16) ||
+   state->crtc_h != state->src_h >> 16 || state->rotation != 0) {
+   iter->curr_clip = iter->num_clips;
+   return 0;
+   }
+
+   iter->fb_src.x1 = 0;
+   iter->fb_src.y1 = 0;
+   iter->fb_src.x2 = state->fb->width;
+   iter->fb_src.y2 = state->fb->height;
+
+   iter->plane_src.x1 = state->src_x >> 16;
+   iter->plane_src.y1 = state->src_y >> 16;
+   iter->plane_src.x2 = iter->plane_src.x1 + (state->src_w >> 16);
+   iter->plane_src.y2 = iter->plane_src.y1 + (state->src_h >> 16);
+   iter->translate_plane_x = -iter->plane_src.x1;
+   iter->translate_plane_y = -iter->plane_src.y1;
+
+   /* Clip plane src rect to fb dimensions */
+   drm_rect_intersect(&iter->plane_src, &iter->fb_src);
+
+   iter->crtc_src.x1 = 0;
+   iter->crtc_src.y1 = 0;
+   iter->crtc_src.x2 = hdisplay;
+   iter->crtc_src.y2 = vdisplay;
+   iter->translate_crtc_x = -(iter->plane_src.x1 - state->crtc_x);
+   iter->translate_crtc_x = -(iter->plane_src.y1 - state->crtc_y);
+
+   /* Clip crtc src rect to plane dimensions */
+   drm_rect_translate(&iter->crtc_src, -iter->translate_crtc_x,
+  -iter->translate_crtc_x);
+   drm_rect_intersect(&iter->crtc_src, &iter->plane_src);
+
+   return 0;
+}
+EXPORT_SYMBOL(drm_atomic_helper_damage_iter_init);
+
+/**
+ * drm_atomic_helper_damage_iter_next - advance the damage iterator
+ * @iter: The iterator to advance.
+ * @rect: Return a rectangle in coordinate specified during iterator init.
+ *
+ * Returns:  true if the output is valid, false if we've reached the end of the
+ * rectangle list. If the first call return false, means need full update.
+ */
+bool
+drm_atomic_helper_damage_iter_next(struct drm_atomic_helper_damage_iter *iter,
+  struct drm_rect *rect)
+{
+   const struct drm_rect *curr_clip;
+
+next_clip:
+   if (iter->curr_clip >= iter->num_clips)
+   return false;
+
+   curr_clip = &iter->clips[iter->curr_clip];
+   iter->curr_clip++;
+
+   rect->x1 = curr_clip->x1;
+   rect->x2 = curr_clip->x2;
+   rect->y1 = curr_clip->y1;
+   rect->y2 = curr_clip->y2;
+
+   /* Clip damage rect within fb limit */
+   if (!drm_rect_intersect(rect, &iter->fb_src))
+   goto next_clip;
+   else if (iter->type & DRM_ATOMIC_HELPER_DAMAGE_CLIP_TYPE_FB)
+   return true;
+
+   /* Clip damage rect within plane limit */
+   if (!drm_rect_intersect(rect, &iter->plane_src))
+   goto next_clip;
+   else if (ite

[RFC 3/3] drm: Add helper to validate damage during modeset_check

2018-04-04 Thread Deepak Rawat
This patch adds a helper which should be called by driver which enable
damage (by calling drm_plane_enable_damage_clips) from atomic_check
hook. This helper for now set the damage to NULL for the planes on crtc
which need full modeset.

The driver also need to check for other crtc properties which can
affect damage in atomic_check hook, like gamma, ctm, etc. Plane related
properties which affect damage can be handled in damage iterator.

Signed-off-by: Deepak Rawat 
---
 drivers/gpu/drm/drm_atomic_helper.c | 47 +
 include/drm/drm_atomic_helper.h |  2 ++
 2 files changed, 49 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index 355b514..23f44ab 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -3987,3 +3987,50 @@ drm_atomic_helper_damage_iter_next(struct 
drm_atomic_helper_damage_iter *iter,
return true;
 }
 EXPORT_SYMBOL(drm_atomic_helper_damage_iter_next);
+
+/**
+ * drm_atomic_helper_check_damage - validate state object for damage changes
+ * @dev: DRM device
+ * @state: the driver state object
+ *
+ * Check the state object if damage is required or not. In case damage is not
+ * required e.g. need modeset, the damage blob is set to NULL.
+ *
+ * NOTE: This helper is not called by core. Those driver which enable damage
+ * using drm_plane_enable_damage_clips should call this from their 
@atomic_check
+ * hook.
+ */
+int drm_atomic_helper_check_damage(struct drm_device *dev,
+  struct drm_atomic_state *state)
+{
+   struct drm_crtc *crtc;
+   struct drm_plane *plane;
+   struct drm_crtc_state *crtc_state;
+   unsigned plane_mask;
+   int i;
+
+   for_each_new_crtc_in_state(state, crtc, crtc_state, i) {
+   if (!drm_atomic_crtc_needs_modeset(crtc_state))
+   continue;
+
+   plane_mask = crtc_state->plane_mask;
+   plane_mask |= crtc->state->plane_mask;
+
+   drm_for_each_plane_mask(plane, dev, plane_mask) {
+   struct drm_plane_state *plane_state =
+   drm_atomic_get_plane_state(state, plane);
+
+   if (IS_ERR(plane_state))
+   return PTR_ERR(plane_state);
+
+   if (plane_state->damage_clips) {
+   
drm_property_blob_put(plane_state->damage_clips);
+   plane_state->damage_clips = NULL;
+   plane_state->num_clips = 0;
+   }
+   }
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL(drm_atomic_helper_check_damage);
diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
index ebd4b66..b12335c 100644
--- a/include/drm/drm_atomic_helper.h
+++ b/include/drm/drm_atomic_helper.h
@@ -224,6 +224,8 @@ drm_atomic_helper_damage_iter_init(struct 
drm_atomic_helper_damage_iter *iter,
 bool
 drm_atomic_helper_damage_iter_next(struct drm_atomic_helper_damage_iter *iter,
   struct drm_rect *rect);
+int drm_atomic_helper_check_damage(struct drm_device *dev,
+  struct drm_atomic_state *state);
 
 /**
  * drm_atomic_crtc_for_each_plane - iterate over planes currently attached to 
CRTC
-- 
2.7.4



[RFC 0/3] drm: page-flip with damage

2018-04-04 Thread Deepak Rawat
Hi All,

This is extension to Lukasz Spintzyk earlier draft of damage interface for drm.
Bascially a new plane property is added called "DAMAGE_CLIPS" which is simply
an array of drm_rect (exported to userspace as drm_mode_rect). The clips
represents damage in framebuffer coordinate of attached fb to the plane.

Helper iterator is added to traverse the damage rectangles and get the damage
clips in framebuffer, plane or crtc coordinates as need by driver
implementation. Finally a helper to reset damage in case need full update is
required. Drivers interested in page-flip with damage should call this from
atomic_check hook.

With the RFC for atomic implementation of dirtyfb ioctl I was thinking
should we need to consider dirty_fb flags, especially
DRM_MODE_FB_DIRTY_ANNOTATE_COPY to be passed with atomic
DAMAGE_CLIPS property blob? I didn't considered that untill now. If no driver
uses that in my opinion for simplicity this can be ignored?

About overlaping of damage rectangles is also not finalized. This really
depends on driver specific implementation and can be left open-ended?

My knowledge is limited to vmwgfx so would like to hear about other driver use
cases and this can be modified in keeping other drivers need.

Going forward driver implementation for vmwgfx and user-space implementation
of kmscube/weston will be next step to test the changes.

Thanks,
Deepak

Deepak Rawat (2):
  drm: Add helper iterator functions to iterate over plane damage.
  drm: Add helper to validate damage during modeset_check

Lukasz Spintzyk (1):
  drm: Add DAMAGE_CLIPS property to plane

 drivers/gpu/drm/drm_atomic.c|  42 +
 drivers/gpu/drm/drm_atomic_helper.c | 173 
 drivers/gpu/drm/drm_mode_config.c   |   5 ++
 drivers/gpu/drm/drm_plane.c |  12 +++
 include/drm/drm_atomic_helper.h |  41 +
 include/drm/drm_mode_config.h   |  15 
 include/drm/drm_plane.h |  16 
 include/uapi/drm/drm_mode.h |  15 
 8 files changed, 319 insertions(+)

-- 
2.7.4



Re: [GIT PULL] x86/build changes for v4.17

2018-04-04 Thread Linus Torvalds
On Wed, Apr 4, 2018 at 4:31 PM, Matthias Kaehlcke  wrote:
>
> From some experiments it looks like clang, in difference to gcc, does
> not treat constant values passed as parameters to inline function as
> constants.

Yeah, I think gcc used to have those semantics a long time ago too.

Many of our __builtin_constant_p() uses are indeed just in macros, but
certainly not all.

Other examples are found in our "fortified" string functions.

There a clang build will likely simply miss some of the build-time
fortification checks, and trigger them at runtime instead.

Of course, we hopefully don't *have* any build-time failures, because
gcc will have caught them, so you won't care as long as clang is a
secondary compiler, but long-term they'd be good.

> I'll ask our compiler folks to take a look, with lower priority than
> other issues in this thread though.

Ack. "asm goto" is way more important. The __builtin_constant_p()
stuff tends to be for various peephole optimizations.

Another example of that can be found in our x86 bit operations macros:

  static __always_inline void
  set_bit(long nr, volatile unsigned long *addr)
  {
if (IS_IMMEDIATE(nr)) {
asm volatile(LOCK_PREFIX "orb %1,%0"
: CONST_MASK_ADDR(nr, addr)
: "iq" ((u8)CONST_MASK(nr))
: "memory");
} else {
asm volatile(LOCK_PREFIX __ASM_SIZE(bts) " %1,%0"
: BITOP_ADDR(addr) : "Ir" (nr) : "memory");
}
  }

where that IS_IMMEDIATE() hides a __builtin_constant_p(). But we could
actually change that - for some reason the test_bit() case looks like
this:

  #define test_bit(nr, addr)  \
(__builtin_constant_p((nr)) \
 ? constant_test_bit((nr), (addr))  \
 : variable_test_bit((nr), (addr)))

which is much more straightforward anyway. I'm not quite sure why we
did it that odd way anyway, but I bet it's just "hysterical raisins"
along with the test_bit() not needing inline asm at all for the
constant case.

 Linus


Re: [GIT PULL] Kernel lockdown for secure boot

2018-04-04 Thread Peter Dolding
> If you don't have secure boot then an attacker with root can modify your
> bootloader or kernel, and on next boot lockdown can be silently disabled.

Stop being narrow minded you don't need secure boot to protect
bootloader or kernel the classic is only boot from read only media.

Another is network boot using https can coreboot firmware.   This
checks the certificate  of the https server against selected CA before
downloading anything and as long as the firmware is set read only in
hardware the attack has absolutely nothing to work on.

In fact the network boot form https server is more secure than UEFI
secureboot due to highly limited parties who can alter/provide the
approved boot loader/kernel image.

Having root user rights does not override physical security.The
fact there are other ways of doing bootloader and kernel security
other than UEFI secureboot that are in lots of cases more secure than
UEFI secureboot due to using more limited keys is the absolute reason
why lockdown is required without UEFI secureboot.

It would make sense to extend kexec to support UEFI secureboot
verification  and also kexec to have frameworks to support other
security options like https server storage of all kernel images.
Please note kexec supporting UEFI secureboot verification should also
support booting non UEFI secureboot but verified by some other method
and having own PK/KEK set for kexec and this would be when the Linux
kernel is placed in firmware and used instead of EFI firmware..

Please note there are many UEFI firmwares that with secureboot off
allow setting up secure https bootting where you are not in fact
validating the boot loader or kernel but validating the source you get
them from.

There are three different ways to achieve a protected boot process.
1) validate the boot files.(this is like UEFI secure boot and many
other methods)
2) validate source the boot files.  Yes this can be apply key check to
image if image is not signed don't boot from it and the image contain
boot loader and kernel then not bother validating the boot loader and
kernel image/parts individually same with https.
3) make boot files read only.

All three achieve the same level of security.   If you are using any
of the three lockdown option may provide some benefit.

Yes https network boot effectively does 2 and 3 so making having a
very limit threat against the boot process.

Remember there is more 1 way to skin a cat just like there is more
than 1 way to make a secure system.   Currently being too narrow in
methods for doing protected booting.


Peter Dolding.


Re: [PATCH v5 2/2] KVM: X86: Add Force Emulation Prefix for "emulate the next instruction"

2018-04-04 Thread Wanpeng Li
2018-04-05 1:09 GMT+08:00 Paolo Bonzini :
> On 04/04/2018 15:35, Wanpeng Li wrote:
>> 2018-04-04 19:59 GMT+08:00 David Hildenbrand :
>>>
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 1eb495e..a55ecef 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -146,6 +146,9 @@ bool __read_mostly enable_vmware_backdoor = false;
  module_param(enable_vmware_backdoor, bool, S_IRUGO);
  EXPORT_SYMBOL_GPL(enable_vmware_backdoor);

 +static bool __read_mostly force_emulation_prefix = false;
 +module_param(force_emulation_prefix, bool, S_IRUGO);
 +
  #define KVM_NR_SHARED_MSRS 16

  struct kvm_shared_msrs_global {
 @@ -4844,6 +4847,21 @@ int handle_ud(struct kvm_vcpu *vcpu)
  {
   enum emulation_result er;

 + if (force_emulation_prefix) {
 + char sig[5]; /* ud2; .ascii "kvm" */
 + struct x86_exception e;
 +
 + if (kvm_read_guest_virt(&vcpu->arch.emulate_ctxt,
 + kvm_get_linear_rip(vcpu), sig, sizeof(sig), 
 &e))
 + goto emulate_ud;
 +
 + if (memcmp(sig, "\xf\xbkvm", sizeof(sig)) == 0) {
 + kvm_rip_write(vcpu, kvm_rip_read(vcpu) + 
 sizeof(sig));
 + return emulate_instruction(vcpu, 0) == EMULATE_DONE;
>>>
>>> What if we would have an invalid instruction here? Shouldn't you handle
>>> the emulate_instruction() like below?
>>> (e.g. keep a variable with the emulation type (0 vs EMULTYPE_TRAP_UD)
>>> and reuse emulate_ud below)
>>
>> emulate_instruction(vcpu, 0) can handle invalid instruction.
>
> But David's observation is still better because your code doesn't handle 
> usermode exits.

My code handles it, return emulate_instruction(vcpu, 0) ==
EMULATE_DONE, it will return 0 since EMULATE_USER_EXIT == EMULATE_DONE
fails.

> I've fixed this up.

Thanks. The codes similar to my v3 but more beauty. :) I change to
this view since Radim's comments to v3
https://www.spinics.net/lists/kvm/msg166999.html

Regards,
Wanpeng Li


Re: [PATCH] [RFC][WIP] namespace.c: Allow some unprivileged proc mounts when not fully visible

2018-04-04 Thread Eric W. Biederman
Alexey Dobriyan  writes:

>> The only option I have seen proposed that might qualify as something
>> general purpose and simple is a new filesystem that is just the process
>> directories of proc.
>
> While "mount -t pid" and "mount -t sysctl" are decades overdue, I don't
> think they cover everything.
>
> IIRC some gcc versions read /proc/meminfo on every invocation. Now
> imagine such program doesn't have a fallback if /proc/ doesn't exist
> (how many thousands such programs are there?) So user is going to ask
> for /proc with just /proc/meminfo only. At this point it is back to
> nearly full /proc.

To avoid falling susceptible to the kinds of checks in fs_fully_visible we
can only offer information about objects that root in the user namespace
has privilege over.  So "mount -t pid" good.  A "/proc/meminfo" bad.

Which in short means if "mount -t pid" isn't good enough.  There really
isn't anything the kernel can do.

Eric


[PATCH v3 4/5] io: change outX() to have their own IO barrier overrides

2018-04-04 Thread Sinan Kaya
Open code writeX() inside outX() so that outX() variants have their own
overrideable Port IO barrier combinations as __io_pbw() and __io_paw() for
actions to be taken before port IO and after port IO write.

Signed-off-by: Sinan Kaya 
---
 include/asm-generic/io.h | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index ca268d9..38a96d1 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -456,7 +456,9 @@ static inline u32 inl(unsigned long addr)
 #define outb outb
 static inline void outb(u8 value, unsigned long addr)
 {
-   writeb(value, PCI_IOBASE + addr);
+   __io_pbw();
+   __raw_writeb(value, PCI_IOBASE + addr);
+   __io_paw();
 }
 #endif
 
@@ -464,7 +466,9 @@ static inline void outb(u8 value, unsigned long addr)
 #define outw outw
 static inline void outw(u16 value, unsigned long addr)
 {
-   writew(value, PCI_IOBASE + addr);
+   __io_pbw();
+   __raw_writew(cpu_to_le16(value), PCI_IOBASE + addr);
+   __io_paw();
 }
 #endif
 
@@ -472,7 +476,9 @@ static inline void outw(u16 value, unsigned long addr)
 #define outl outl
 static inline void outl(u32 value, unsigned long addr)
 {
-   writel(value, PCI_IOBASE + addr);
+   __io_pbw();
+   __raw_writel(cpu_to_le32(value), PCI_IOBASE + addr);
+   __io_paw();
 }
 #endif
 
-- 
2.7.4



Re: [PATCH] ring-buffer: Add set/clear_current_oom_origin() during allocations

2018-04-04 Thread Joel Fernandes
Hi Steve,

On Wed, Apr 4, 2018 at 9:18 AM, Joel Fernandes  wrote:
> On Wed, Apr 4, 2018 at 9:13 AM, Steven Rostedt  wrote:
> [..]
>>>
>>> Also, I agree with the new patch and its nice idea to do that.
>>
>> Thanks, want to give it a test too?

With the latest tree and the below diff, I can still OOM-kill a victim
process doing a large buffer_size_kb write:

I pulled your ftrace/core and added this:
+   /*
i = si_mem_available();
if (i < nr_pages)
return -ENOMEM;
+   */

Here's a run in Qemu with 4-cores 1GB total memory:

bash-4.3# ./m -m 1M &
[1] 1056
bash-4.3#
bash-4.3#
bash-4.3#
bash-4.3# echo 1000 > /d/tracing/buffer_size_kb
[   33.213988] Out of memory: Kill process 1042 (bash) score
1712050900 or sacrifice child
[   33.215349] Killed process 1056 (m) total-vm:9220kB,
anon-rss:7564kB, file-rss:4kB, shmem-rss:640kB
bash: echo: write error: Cannot allocate memory
[1]+  Killed  ./m -m 1M
bash-4.3#
--

As you can see, OOM killer triggers and kills "m" which is my busy
memory allocator (it allocates and frees lots of memory and does that
in a loop)

Here's the m program, sorry if it looks too ugly:
https://pastebin.com/raw/aG6Qw37Z

Happy to try anything else, BTW when the si_mem_available check
enabled, this doesn't happen and the buffer_size_kb write fails
normally without hurting anything else.

- Joel


[PATCH v3 2/5] io: define stronger ordering for the default readX() implementation

2018-04-04 Thread Sinan Kaya
The default implementation of mapping readX() to __raw_readX() is wrong.
readX() has stronger ordering semantics. Compiler is allowed to reorder
__raw_readX() against the memory accesses following register read.

Use the previously defined __io_ar() and __io_br() macros to harden
code generation according to architecture support.

Signed-off-by: Sinan Kaya 
---
 include/asm-generic/io.h | 28 
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index a3d349e..fc554af 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -153,7 +153,12 @@ static inline void __raw_writeq(u64 value, volatile void 
__iomem *addr)
 #define readb readb
 static inline u8 readb(const volatile void __iomem *addr)
 {
-   return __raw_readb(addr);
+   u8 val;
+
+   __io_br();
+   val = __raw_readb(addr);
+   __io_ar();
+   return val;
 }
 #endif
 
@@ -161,7 +166,12 @@ static inline u8 readb(const volatile void __iomem *addr)
 #define readw readw
 static inline u16 readw(const volatile void __iomem *addr)
 {
-   return __le16_to_cpu(__raw_readw(addr));
+   u16 val;
+
+   __io_br();
+   val = __le16_to_cpu(__raw_readw(addr));
+   __io_ar();
+   return val;
 }
 #endif
 
@@ -169,7 +179,12 @@ static inline u16 readw(const volatile void __iomem *addr)
 #define readl readl
 static inline u32 readl(const volatile void __iomem *addr)
 {
-   return __le32_to_cpu(__raw_readl(addr));
+   u32 val;
+
+   __io_br();
+   val = __le32_to_cpu(__raw_readl(addr));
+   __io_ar();
+   return val;
 }
 #endif
 
@@ -178,7 +193,12 @@ static inline u32 readl(const volatile void __iomem *addr)
 #define readq readq
 static inline u64 readq(const volatile void __iomem *addr)
 {
-   return __le64_to_cpu(__raw_readq(addr));
+   u64 val;
+
+   __io_br();
+   val = __le64_to_cpu(__raw_readq(addr));
+   __io_ar();
+   return val;
 }
 #endif
 #endif /* CONFIG_64BIT */
-- 
2.7.4



[PATCH v3 1/5] io: define several IO & PIO barrier types for the asm-generic version

2018-04-04 Thread Sinan Kaya
Getting ready to harden readX()/writeX() and inX()/outX() semantics for the
generic implementation.

Defining two set of macros as __io_br() and __io_ar() to indicate actions
to be taken before and after MMIO read.

Defining two set of macros as __io_bw() and __io_aw() to indicate actions
to be taken before and after MMIO write.

Defining two set of macros as __io_pbw() and __io_paw() to indicate actions
to be taken before and after Port IO write.

Defining two set of macros as __io_pbr() and __io_par() to indicate actions
to be taken before and after Port IO read.

If rmb() is available for the architecture, prefer rmb() as the default
implementation of __io_ar()/__io_par().

If wmb() is available for the architecture, prefer wmb() as the default
implementation of __io_bw()/__io_pbw().

Signed-off-by: Sinan Kaya 
---
 include/asm-generic/io.h | 43 +++
 1 file changed, 43 insertions(+)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index b4531e3..a3d349e 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -25,6 +25,49 @@
 #define mmiowb() do {} while (0)
 #endif
 
+#ifndef __io_br
+#define __io_br()  barrier()
+#endif
+
+#ifndef __io_ar
+#ifdef rmb
+/* prefer rmb() as the default implementation of __io_ar() if supported */
+#define __io_ar()  rmb()
+#else
+#define __io_ar()  barrier()
+#endif
+#endif
+
+#ifndef __io_bw
+#ifdef wmb
+/* prefer wmb() as the default implementation of __io_bw() if supported */
+#define __io_bw()  wmb()
+#else
+#define __io_bw()  barrier()
+#endif
+#endif
+
+#ifndef __io_aw
+#define __io_aw()  barrier()
+#endif
+
+#ifndef __io_pbw
+#define __io_pbw() __io_bw()
+#endif
+
+#ifndef __io_paw
+#define __io_paw() __io_aw()
+#endif
+
+#ifndef __io_pbr
+#define __io_pbr() __io_br()
+#endif
+
+#ifndef __io_par
+#define __io_par() __io_ar()
+#endif
+
+
 /*
  * __raw_{read,write}{b,w,l,q}() access memory in native endianness.
  *
-- 
2.7.4



[PATCH v3 5/5] io: change inX() to have their own IO barrier overrides

2018-04-04 Thread Sinan Kaya
Open code readX() inside inX() so that inX() variants have their own
overrideable Port IO barrier combinations as __io_pbr() and __io_par() for
actions to be taken before port IO and after port IO read.

Signed-off-by: Sinan Kaya 
---
 include/asm-generic/io.h | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index 38a96d1..da51092 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -432,7 +432,12 @@ static inline void writesq(volatile void __iomem *addr, 
const void *buffer,
 #define inb inb
 static inline u8 inb(unsigned long addr)
 {
-   return readb(PCI_IOBASE + addr);
+   u8 val;
+
+   __io_pbr();
+   val = __raw_readb(PCI_IOBASE + addr);
+   __io_par();
+   return val;
 }
 #endif
 
@@ -440,7 +445,12 @@ static inline u8 inb(unsigned long addr)
 #define inw inw
 static inline u16 inw(unsigned long addr)
 {
-   return readw(PCI_IOBASE + addr);
+   u16 val;
+
+   __io_pbr();
+   val = __le16_to_cpu(__raw_readw(PCI_IOBASE + addr));
+   __io_par();
+   return val;
 }
 #endif
 
@@ -448,7 +458,12 @@ static inline u16 inw(unsigned long addr)
 #define inl inl
 static inline u32 inl(unsigned long addr)
 {
-   return readl(PCI_IOBASE + addr);
+   u32 val;
+
+   __io_pbr();
+   val = __le32_to_cpu(__raw_readl(PCI_IOBASE + addr));
+   __io_par();
+   return val;
 }
 #endif
 
-- 
2.7.4



[PATCH v3 3/5] io: define stronger ordering for the default writeX() implementation

2018-04-04 Thread Sinan Kaya
The default implementation of mapping writeX() to __raw_writeX() is wrong.
writeX() has stronger ordering semantics. Compiler is allowed to reorder
memory writes against __raw_writeX().

Use the previously defined __io_aw() and __io_bw() macros to harden
code generation according to architecture support.

Signed-off-by: Sinan Kaya 
---
 include/asm-generic/io.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index fc554af..ca268d9 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -207,7 +207,9 @@ static inline u64 readq(const volatile void __iomem *addr)
 #define writeb writeb
 static inline void writeb(u8 value, volatile void __iomem *addr)
 {
+   __io_bw();
__raw_writeb(value, addr);
+   __io_aw();
 }
 #endif
 
@@ -215,7 +217,9 @@ static inline void writeb(u8 value, volatile void __iomem 
*addr)
 #define writew writew
 static inline void writew(u16 value, volatile void __iomem *addr)
 {
+   __io_bw();
__raw_writew(cpu_to_le16(value), addr);
+   __io_aw();
 }
 #endif
 
@@ -223,7 +227,9 @@ static inline void writew(u16 value, volatile void __iomem 
*addr)
 #define writel writel
 static inline void writel(u32 value, volatile void __iomem *addr)
 {
+   __io_bw();
__raw_writel(__cpu_to_le32(value), addr);
+   __io_aw();
 }
 #endif
 
@@ -232,7 +238,9 @@ static inline void writel(u32 value, volatile void __iomem 
*addr)
 #define writeq writeq
 static inline void writeq(u64 value, volatile void __iomem *addr)
 {
+   __io_bw();
__raw_writeq(__cpu_to_le64(value), addr);
+   __io_aw();
 }
 #endif
 #endif /* CONFIG_64BIT */
-- 
2.7.4



Re: [PATCH v4 5/9] vsprintf: Factor out %p[iI] handler as ip_addr_string()

2018-04-04 Thread Sergey Senozhatsky
On (04/04/18 10:58), Petr Mladek wrote:
> 
> Also it is better to warn about unknown specifier instead of falling
> back to the %p behavior. It will help people to understand what is
> going wrong. They expect the IP address and not a pointer anyway
> in this situation.
>

May be. If one sees a hashed value where IP address/device name/etc
was meant to be then it's already a sign that something is wrong.
Those WARN_ONCE that you have added make things simpler, I agree.
A quick question, what happens on !CONFIG_BUG systems (where we have
no_printk() WARN)?

-ss


[GIT PULL] first round of SCSI updates for the 4.16+ merge window

2018-04-04 Thread James Bottomley
This is mostly updates of the usual drivers: arcmsr, qla2xx, lpfc, ufs,
mpt3sas, hisi_sas.  In addition we have removed several really old
drivers: sym53c416, NCR53c406a, fdomain, fdomain_cs and removed the old
scsi_module.c initialization from all remaining drivers.  Plus an
assortment of bug fixes, initialization errors and other minor fixes.

This time there's a really nasty merge between the fixes and misc
branches of the SCSI tree which I've resolved in the merge commit: the
non obvious part is that you have to remove some lines from
qla2xxx/qla_gs.c to avoid a compile failure which would cause bisection
 problems, so I've done that and documented it in the merge commit.  If
you'd like to do it yourself, let me know and I'll send the two
branches separately.

The patch is available here:

git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-for-linus

The short changelog is:

Adrian Hunter (1):
  scsi: ufs: Add support for Auto-Hibernate Idle Timer

Arnd Bergmann (7):
  scsi: mpt3sas: clarify mmio pointer types
  scsi: qedi: fix build regression
  scsi: sym53c416: avoid section mismatch with LTO
  scsi: NCR53c406a: avoid section mismatch with LTO
  scsi: qedf: use correct strncpy() size
  scsi: qedf: fix LTO-enabled build
  scsi: qedi: fix building with LTO

Bart Van Assche (15):
  scsi: ufs: Fix kernel-doc errors and warnings
  scsi: sd_zbc: Fix sd_zbc_get_seq_zones() kernel-doc header
  scsi: libsas: Fix kernel-doc headers
  scsi: core: Reduce number of scsi_test_unit_ready() retries
  scsi: core: Move the eh_deadline module parameter definition
  scsi: core: scmd_eh_abort_handler(): Add a comment
  scsi: pmcraid: Use sgl_alloc_order() and sgl_free_order()
  scsi: pmcraid: Remove an unused structure member
  scsi: ipr: Use sgl_alloc_order() and sgl_free_order()
  scsi: scsi_debug: Simplify request tag decoding
  scsi: qla2xxx: Fix function argument descriptions
  scsi: qla4xxx: Move an array from a .h into a .c file
  scsi: qla4xxx: Remove unused symbols
  scsi: qla2xxx: Remove unused symbols
  scsi: qla2xxx: Use %p for printing pointers

Ching Huang (4):
  scsi: arcmsr: Change driver version to v1.40.00.05-20180309
  scsi: arcmsr: Sleep to avoid CPU stuck too long for waiting adapter ready
  scsi: arcmsr: Handle adapter removed due to thunderbolt cable 
disconnection.
  scsi: arcmsr: Rename ACB_F_BUS_HANG_ON to ACB_F_ADAPTER_REMOVED for 
adapter hot-plug

Christoph Hellwig (11):
  scsi: remove the old scsi_module.c initialization model
  scsi: remove the sym53c416 driver
  scsi: remove the NCR53c406a driver
  scsi: remove the fdomain and fdomain_cs drivers
  scsi: mvme147: stop using scsi_module.c
  scsi: esas2r: remove initialization / cleanup dead wood
  scsi: core: unexport scsi_host_set_state
  scsi: documentation: remove ChangeLog.1992-1997
  scsi: aha1740: stop using scsi_unregister
  scsi: ips: don't set .detect and .release in the host template
  scsi: dpt_i2o: stop using scsi_unregister

Colin Ian King (7):
  scsi: qla2xxx: fix spelling mistake: "existant" -> "existent"
  scsi: lpfc: make several unions static, fix non-ANSI prototype
  scsi: scsi_transport_spi: make two const arrays static, shrinks object 
size
  scsi: pmcraid: remove redundant initializations of pointer 'ioadl'
  scsi: isci: remove redundant initialization to 'bit'
  scsi: libfc: remove redundant initialization of 'disc'
  scsi: qedf: remove redundant initialization of 'fcport'

Dan Carpenter (3):
  scsi: dpt_i2o: use after free in __adpt_reset()
  scsi: dpt_i2o: use after free in adpt_release()
  scsi: atp870u: 64 bit bug in atp885_init()

Darren Trapp (10):
  scsi: qla2xxx: Cleanup code to improve FC-NVMe error handling
  scsi: qla2xxx: Fix FC-NVMe IO abort during driver reset
  scsi: qla2xxx: Fix retry for PRLI RJT with reason of BUSY
  scsi: qla2xxx: Remove nvme_done_list
  scsi: qla2xxx: Return busy if rport going away
  scsi: qla2xxx: Fix n2n_ae flag to prevent dev_loss on PDB change
  scsi: qla2xxx: Add FC-NVMe abort processing
  scsi: qla2xxx: Add changes for devloss timeout in driver
  scsi: qla2xxx: Set IIDMA and fcport state before 
qla_nvme_register_remote()
  scsi: qla2xxx: Restore ZIO threshold setting

Don Brace (1):
  scsi: smartpqi: update driver version

Douglas Gilbert (2):
  scsi: core: Make SCSI Status CONDITION MET equivalent to GOOD
  scsi: scsi_debug: implement IMMED bit

Finn Thain (1):
  scsi: jazz_esp, sun3x_esp: Pass struct device pointer in dma calls

Geert Uytterhoeven (1):
  scsi: hisi_sas: Remove depends on HAS_DMA in case of platform dependency

Hannes Reinecke (1):
  scsi: raid_class: Add 'JBOD' RAID level

James Smart (43):
  scsi: lpfc: Change Copyright of 12.0.0.1 modified files to 2018
  scsi: lpfc: update driver 

Re: [PATCH] uts_namespace: Move boot_id in uts namespace

2018-04-04 Thread Marian Marinov
On 04/04/2018 07:02 PM, Eric W. Biederman wrote:
> Angel Shtilianov  writes:
> 
>> Currently the same boot_id is reported for all containers running
>> on a host node, including the host node itself. Even after restarting
>> a container it will still have the same persistent boot_id.
>>
>> This can cause troubles in cases where you have multiple containers
>> from the same cluster on one host node. The software inside each
>> container will get the same boot_id and thus fail to join the cluster,
>> after the first container from the node has already joined.
>>
>> UTS namespace on other hand keeps the machine specific data, so it
>> seems to be the correct place to move the boot_id and instantiate it,
>> so each container will have unique id for its own boot lifetime, if
>> it has its own uts namespace.
> 
> Technically this really needs to use the sysctl infrastructure that
> allows you to register different files in different namespaces.  That
> way the value you read from proc_do_uuid will be based on who opens the
> file not on who is reading the file.

Ok, so would you accept a patch that reimplements boot_id trough the sysctl 
infrastructure?

> Practically why does a bind mount on top of boot_id work?  What makes
> this a general problem worth solving in the kernel?  Why is hiding the
> fact that you are running the same instance of the same kernel a useful
> thing? That is the reality.

The problem is, that the distros do not know that they are in container and 
don't know that they have to bind mount something on top of boot_id.
You need to tell Docker, LXC/LXD and all other container runtimes that they 
need to do this bind mount for boot_id.
I consider this to be a general issue, that lacks good general solution in 
userspace.
The kernel is providing this boot_id interface, but it is giving wrong data in 
the context of containers.
Proposing to fix this problem in userspace seams like ignoring the issue.
You could have said to the Consul guys, that they should simply stop using 
boot_id, because it doesn't work correctly on containers.

Marian

> 
> Eric
> 
> 
> 
> 
>> Signed-off-by: Angel Shtilianov 
>> ---
>>  drivers/char/random.c   | 4 
>>  include/linux/utsname.h | 1 +
>>  kernel/utsname.c| 4 +++-
>>  3 files changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/char/random.c b/drivers/char/random.c
>> index ec42c8bb9b0d..e05daf7f38f4 100644
>> --- a/drivers/char/random.c
>> +++ b/drivers/char/random.c
>> @@ -1960,6 +1960,10 @@ static int proc_do_uuid(struct ctl_table *table, int 
>> write,
>>  unsigned char buf[64], tmp_uuid[16], *uuid;
>>  
>>  uuid = table->data;
>> +#ifdef CONFIG_UTS_NS
>> +if (!!uuid && (uuid == (unsigned char *)sysctl_bootid))
>> +uuid = current->nsproxy->uts_ns->sysctl_bootid;
>> +#endif
>>  if (!uuid) {
>>  uuid = tmp_uuid;
>>  generate_random_uuid(uuid);
>> diff --git a/include/linux/utsname.h b/include/linux/utsname.h
>> index c8060c2ecd04..f704aca3e95a 100644
>> --- a/include/linux/utsname.h
>> +++ b/include/linux/utsname.h
>> @@ -27,6 +27,7 @@ struct uts_namespace {
>>  struct user_namespace *user_ns;
>>  struct ucounts *ucounts;
>>  struct ns_common ns;
>> +char sysctl_bootid[16];
>>  } __randomize_layout;
>>  extern struct uts_namespace init_uts_ns;
>>  
>> diff --git a/kernel/utsname.c b/kernel/utsname.c
>> index 913fe4336d2b..f1749cdcd341 100644
>> --- a/kernel/utsname.c
>> +++ b/kernel/utsname.c
>> @@ -34,8 +34,10 @@ static struct uts_namespace *create_uts_ns(void)
>>  struct uts_namespace *uts_ns;
>>  
>>  uts_ns = kmalloc(sizeof(struct uts_namespace), GFP_KERNEL);
>> -if (uts_ns)
>> +if (uts_ns) {
>>  kref_init(&uts_ns->kref);
>> +memset(uts_ns->sysctl_bootid, 0, 16);
>> +}
>>  return uts_ns;
>>  }
> 


-- 
Marian Marinov
Co-founder & CTO of Kyup.com
Jabber/GTalk: hack...@jabber.org
IRQ: 7556201
IRC: hackman @ irc.freenode.net
Mobile: +359 886 660 270


Re: [PATCH v4 7/9] vsprintf: Factor out %pO handler as kobject_string()

2018-04-04 Thread Sergey Senozhatsky
On (04/05/18 08:35), Sergey Senozhatsky wrote:
> 
> Was this your intention?
> 

Ah, it was. You mentioned it in the commit message

: In fact, this avoids leaking the address when invalid %pO format
: specifier is used. The old code fallen back to printing the
: non-hashed value.

Is it so? The default is

/* default is to _not_ leak addresses, hash before printing */
return ptr_to_id(buf, end, ptr, spec);

-ss


[PATCH] [RESEND] drm/i915/dp: Send DPCD ON for MST before phy_up

2018-04-04 Thread Lyude Paul
As it turns out, the aux block being off was not the real problem here,
as transition from D3 to D0 is mandated by the DP spec to take a maximum
of 1ms, whereas we're allowed a 100ms timeframe to respond to ESI irqs.
The real problem here is a bit more subtle.

When doing a modeset where the problem of the sink timing out to our
sideband requests when transitioning from D3 to D0 occurs, the timeout
is from the aux block not coming on. However, nothing else times out
other than the initial phy_up message because the DPCD on call in
intel_ddi_enable_dp() ends up waking up the AUX block on the hub, not
the phy_up sideband message. This means that the real fix we need is to
use the DPMS on before sending a phy_up to ensure that the hub is ready
to accept sideband messages.

Signed-off-by: Lyude Paul 
Cc: Dhinakaran Pandiyan 
Cc: Ville Syrjälä 
Cc: Laura Abbott 
Cc: sta...@vger.kernel.org
Fixes: ad260ab32a4d9 ("drm/i915/dp: Write to SET_POWER dpcd to enable MST hub.")
---
Sorry for all the spam guys! This is my last attempt at getting this
patch to actually come up on intel-gfx's patchwork page instead of
dri-devel. I have no idea what is going on with patchwork right now :|,
but maybe removing dri-devel from the CC entirely will help.


 drivers/gpu/drm/i915/intel_ddi.c| 6 +-
 drivers/gpu/drm/i915/intel_dp_mst.c | 1 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_ddi.c b/drivers/gpu/drm/i915/intel_ddi.c
index a6672a9abd85..9bd675f73f7b 100644
--- a/drivers/gpu/drm/i915/intel_ddi.c
+++ b/drivers/gpu/drm/i915/intel_ddi.c
@@ -2324,7 +2324,11 @@ static void intel_ddi_pre_enable_dp(struct intel_encoder 
*encoder,
intel_prepare_dp_ddi_buffers(encoder, crtc_state);
 
intel_ddi_init_dp_buf_reg(encoder);
-   intel_dp_sink_dpms(intel_dp, DRM_MODE_DPMS_ON);
+   /* for MST, we do DPMS_ON outside of here so that DPMS_ON can happen
+* before drm_dp_send_power_updown_phy()
+*/
+   if (!intel_dp->is_mst)
+   intel_dp_sink_dpms(intel_dp, DRM_MODE_DPMS_ON);
intel_dp_start_link_train(intel_dp);
if (port != PORT_A || INTEL_GEN(dev_priv) >= 9)
intel_dp_stop_link_train(intel_dp);
diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c 
b/drivers/gpu/drm/i915/intel_dp_mst.c
index c3de0918ee13..eff9a4eae1f0 100644
--- a/drivers/gpu/drm/i915/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/intel_dp_mst.c
@@ -223,6 +223,7 @@ static void intel_mst_pre_enable_dp(struct intel_encoder 
*encoder,
 
DRM_DEBUG_KMS("active links %d\n", intel_dp->active_mst_links);
 
+   intel_dp_sink_dpms(intel_dp, DRM_MODE_DPMS_ON);
drm_dp_send_power_updown_phy(&intel_dp->mst_mgr, connector->port, true);
if (intel_dp->active_mst_links == 0)
intel_dig_port->base.pre_enable(&intel_dig_port->base,
-- 
2.14.3



  1   2   3   4   5   6   7   8   9   10   >