date:20160524

Re: [PATCH v3] KVM: halt-polling: poll if emulated lapic timer will fire soon

2016-05-24 Thread Christian Borntraeger

On 05/24/2016 04:25 AM, Wanpeng Li wrote:
> 2016-05-24 10:19 GMT+08:00 Wanpeng Li :
>> 2016-05-24 2:01 GMT+08:00 David Matlack :
>>> On Sun, May 22, 2016 at 5:42 PM, Wanpeng Li  wrote:
 From: Wanpeng Li 
>>>
>>> I'm ok with this patch, but I'd like to better understand the target
>>> workloads. What type of workloads do you expect to benefit from this?
>>
>> dynticks guests I think is one of workloads which can get benefit,
>> there are lots of upcoming fire timers captured by my feature. Even
>> during TCP testing. And also the workload of Yang's.
> 
> Do you think I should add an module parameter to enable/disable it
> during module insmod or current patch is fine?

What about getting rid of this hunk

-   val = 1;
+   val = halt_poll_ns_base;


and then rename "halt_poll_ns_base" into "halt_poll_ns_timer" that
can be changed as module parameter?




I also experimented with an s390 implementation, which seems pretty 
straightforward.
It is probably something like the following (whitespace damaged due to 
pcopy/paste)
and needs more testing.

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 38bbc98..a97739d 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -682,6 +682,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 struct kvm_async_pf *work);
 
 extern int sie64a(struct kvm_s390_sie_block *, u64 *);
+extern u64 kvm_s390_timer_remaining(struct kvm_vcpu *vcpu);
 extern char sie_exit;
 
 static inline void kvm_arch_hardware_disable(void) {}
@@ -699,7 +700,7 @@ static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu 
*vcpu) {}
 static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
 static inline u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu)
 {
-   return -1ULL;
+   return kvm_s390_timer_remaining(vcpu);
 }
 
 void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu);
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 5a80af7..5b209a2 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -936,6 +936,17 @@ static u64 __calculate_sltime(struct kvm_vcpu *vcpu)
return sltime;
 }
 
+
+u64 kvm_s390_timer_remaining(struct kvm_vcpu *vcpu)
+{
+   u64 result;
+
+   preempt_disable();
+   result = __calculate_sltime(vcpu);
+   preempt_enable();
+   return result;
+}
+
 int kvm_s390_handle_wait(struct kvm_vcpu *vcpu)
 {
u64 sltime;

Re: [PATCH 03/16] sched/fair: Disregard idle task wakee_flips in wake_wide

2016-05-24 Thread Yuyang Du

On Mon, May 23, 2016 at 01:00:10PM +0100, Morten Rasmussen wrote:
> On Mon, May 23, 2016 at 01:12:07PM +0200, Mike Galbraith wrote:
> > On Mon, 2016-05-23 at 11:58 +0100, Morten Rasmussen wrote:
> > > wake_wide() is based on task wakee_flips of the waker and the wakee to
> > > decide whether an affine wakeup is desirable. On lightly loaded systems
> > > the waker is frequently the idle task (pid=0) which can accumulate a lot
> > > of wakee_flips in that scenario. It makes little sense to prevent affine
> > > wakeups on an idle cpu due to the idle task wakee_flips, so it makes
> > > more sense to ignore them in wake_wide().
> > 
> > You sure?  What's the difference between a task flipping enough to
> > warrant spreading the load, and an interrupt source doing the same? 
> >  I've both witnessed firsthand, and received user confirmation of this
> > very thing improving utilization.
> 
> Right, I didn't consider the interrupt source scenario, my fault.
> 
> The problem then seems to be distinguishing truly idle and busy doing
> interrupts. The issue that I observe is that wake_wide() likes pushing
> tasks around in lightly scenarios which isn't desirable for power
> management. Selecting the same cpu again may potentially let others
> reach deeper C-state.
> 
> With that in mind I will if I can do better. Suggestions are welcome :-)
 
On mobile, the factor is as small as 2 to 4, may easily be exceeded,
so decay at HZ may be too slow.

> > 
> > > cc: Ingo Molnar 
> > > cc: Peter Zijlstra 
> > > 
> > > Signed-off-by: Morten Rasmussen 
> > > ---
> > >  kernel/sched/fair.c | 4 
> > >  1 file changed, 4 insertions(+)
> > > 
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index c49e25a..0fe3020 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -5007,6 +5007,10 @@ static int wake_wide(struct task_struct *p)
> > >   unsigned int slave = p->wakee_flips;
> > >   int factor = this_cpu_read(sd_llc_size);
> > >  
> > > + /* Don't let the idle task prevent affine wakeups */
> > > + if (is_idle_task(current))
> > > + return 0;
> > > +
> > >   if (master < slave)
> > >   swap(master, slave);
> > >   if (slave < factor || master < slave * factor)

Re: [PATCH 09/16] sched/fair: Let asymmetric cpu configurations balance at wake-up

2016-05-24 Thread Mike Galbraith

On Mon, 2016-05-23 at 11:58 +0100, Morten Rasmussen wrote:
> Currently, SD_WAKE_AFFINE always takes priority over wakeup balancing if
> SD_BALANCE_WAKE is set on the sched_domains. For asymmetric
> configurations SD_WAKE_AFFINE is only desirable if the waking task's
> compute demand (utilization) is suitable for the cpu capacities
> available within the SD_WAKE_AFFINE sched_domain. If not, let wakeup
> balancing take over (find_idlest_{group, cpu}()).
> 
> The assumption is that SD_WAKE_AFFINE is never set for a sched_domain
> containing cpus with different capacities. This is enforced by a
> previous patch based on the SD_ASYM_CPUCAPACITY flag.
> 
> Ideally, we shouldn't set 'want_affine' in the first place, but we don't
> know if SD_BALANCE_WAKE is enabled on the sched_domain(s) until we start
> traversing them.

This doesn't look like it's restricted to big/little setups, so could
overrule wake_wide() wanting to NAK a x-node pull.

> > 
> > cc: Ingo Molnar 
> > cc: Peter Zijlstra 
> > 
> > Signed-off-by: Morten Rasmussen 
> > ---
> >  kernel/sched/fair.c | 28 +++-
> >  1 file changed, 27 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 564215d..ce44fa7 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -114,6 +114,12 @@ unsigned int __read_mostly sysctl_sched_shares_window 
> > = 1000UL;
> >  unsigned int sysctl_sched_cfs_bandwidth_slice = 5000UL;
> >  #endif
> >  
> > +/*
> > + * The margin used when comparing utilization with cpu capacity:
> > + * util * 1024 < capacity * margin
> > + */
> > +unsigned int capacity_margin = 1280; /* ~20% */
> > +
> >  static inline void update_load_add(struct load_weight *lw, unsigned long 
> > inc)
> >  {
> >  > >> > lw->weight += inc;
> > @@ -5293,6 +5299,25 @@ static int cpu_util(int cpu)
> >  > >> > return (util >= capacity) ? capacity : util;
> >  }
> >  
> > +static inline int task_util(struct task_struct *p)
> > +{
> > +> >> > return p->se.avg.util_avg;
> > +}
> > +
> > +static int wake_cap(struct task_struct *p, int cpu, int prev_cpu)
> > +{
> > +> >> > long delta;
> > +> >> > long prev_cap = capacity_of(prev_cpu);
> > +
> > +> >> > delta = cpu_rq(cpu)->rd->max_cpu_capacity - prev_cap;
> > +
> > +> >> > /* prev_cpu is fairly close to max, no need to abort 
> > wake_affine */
> > +> >> > if (delta < prev_cap >> 3)
> > +> >> > > > return 0;
> > +
> > +> >> > return prev_cap * 1024 < task_util(p) * capacity_margin;
> > +}
> > +
> >  /*
> >   * select_task_rq_fair: Select target runqueue for the waking task in 
> > domains
> >   * that have the 'sd_flag' flag set. In practice, this is SD_BALANCE_WAKE,
> > @@ -5316,7 +5341,8 @@ select_task_rq_fair(struct task_struct *p, int 
> > prev_cpu, int sd_flag, int wake_f
> >  
> >  > >> > if (sd_flag & SD_BALANCE_WAKE) {
> >  > >> > > > record_wakee(p);
> > -> >> > > > want_affine = !wake_wide(p) && 
> > cpumask_test_cpu(cpu, tsk_cpus_allowed(p));
> > +> >> > > > want_affine = !wake_wide(p) && !wake_cap(p, cpu, 
> > prev_cpu)
> > +> >> > > > > >   && cpumask_test_cpu(cpu, 
> > tsk_cpus_allowed(p));
> >  > >> > }
> >  
> >  > >> > rcu_read_lock();

Re: [PATCH v3] KVM: halt-polling: poll if emulated lapic timer will fire soon

2016-05-24 Thread Wanpeng Li

2016-05-24 10:19 GMT+08:00 Wanpeng Li :
> 2016-05-24 2:01 GMT+08:00 David Matlack :
>> On Sun, May 22, 2016 at 5:42 PM, Wanpeng Li  wrote:
>>> From: Wanpeng Li 
>>
>> I'm ok with this patch, but I'd like to better understand the target
>> workloads. What type of workloads do you expect to benefit from this?
>
> dynticks guests I think is one of workloads which can get benefit,
> there are lots of upcoming fire timers captured by my feature. Even
> during TCP testing. And also the workload of Yang's.
>
>>
>>>
>>> If an emulated lapic timer will fire soon(in the scope of 10us the
>>> base of dynamic halt-polling, lower-end of message passing workload
>>> latency TCP_RR's poll time < 10us) we can treat it as a short halt,
>>> and poll to wait it fire, the fire callback apic_timer_fn() will set
>>> KVM_REQ_PENDING_TIMER, and this flag will be check during busy poll.
>>> This can avoid context switch overhead and the latency which we wake
>>> up vCPU.
>>>
>>> This feature is slightly different from current advance expiration
>>> way. Advance expiration rely on the vCPU is running(do polling before
>>> vmentry). But in some cases, the timer interrupt may be blocked by
>>> other thread(i.e., IF bit is clear) and vCPU cannot be scheduled to
>>> run immediately. So even advance the timer early, vCPU may still see
>>> the latency. But polling is different, it ensures the vCPU to aware
>>> the timer expiration before schedule out.
>>>
>>> iperf TCP get ~6% bandwidth improvement.
>>
>> I think my question got lost in the previous thread :). Can you
>> explain why TCP bandwidth improves with this patch?
>

Please forget TCP stuff. I run lmbench ctx switch benchmark:

echo HRTICK > /sys/kernel/debug/sched_features in dynticks guests.

Context switching - times in microseconds - smaller is better
-
Host OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
 ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
- - -- -- -- -- -- --- ---
kernel Linux 4.6.0+ 7.9800   11.0   10.8   14.6 9.430013.0
10.2 vanilla
kernel Linux 4.6.0+   15.3   13.6   10.7   12.5 9.12.8 7.38000 poll

Regards,
Wanpeng Li

Re: [PATCH v3] dell-rbtn: Ignore ACPI notifications if device is suspended

2016-05-24 Thread Pali Rohár

On Tuesday 24 May 2016 06:48:41 Andrei Borzenkov wrote:
> 24.05.2016 02:03, Gabriele Mazzotta пишет:
> > On 24/05/2016 00:22, Pali Rohár wrote:
> >> On Tuesday 24 May 2016 00:17:15 Darren Hart wrote:
> >>> On Tue, May 24, 2016 at 12:06:03AM +0200, Pali Rohár wrote:
>  On Monday 23 May 2016 23:26:55 Darren Hart wrote:
> > I've queued this. Thanks for your patience.
> 
>  Ok, In that case I would update comments in patch to try it more
>  clear what code is doing.
> >>>
> >>> I thought I had your approval on this one Pali. Apologies if that was
> >>> not the case. Did I miss a change request from you?
> >>>
> >>> If so, please point me at it, and I'll dequeue this one and wait for
> >>> an updated one.
> >>
> >> I just wanted to review that code from somebody else and decide if 
> >> accept it or not. Because I was not sure if it is OK...
> >>
> >> But there was no objection, so patch is OK.
> >>
> >> And I pointed that patch could have better comments to describe what it 
> >> is doing as at first time I was confused.
> >>
> >> So I believe that you can update patch in your queue with new version 
> >> which just change comments in source code (without functional changes).
> >>
> > 
> > Something such as the following?
> > Feel free to reword the comments if you have something better in mind.
> > 
> > ---
> >  drivers/platform/x86/dell-rbtn.c | 56 
> > 
> >  1 file changed, 56 insertions(+)
> > 
> > diff --git a/drivers/platform/x86/dell-rbtn.c 
> > b/drivers/platform/x86/dell-rbtn.c
> > index 331d63c..e0208ba 100644
> > --- a/drivers/platform/x86/dell-rbtn.c
> > +++ b/drivers/platform/x86/dell-rbtn.c
> > @@ -28,6 +28,7 @@ struct rbtn_data {
> > enum rbtn_type type;
> > struct rfkill *rfkill;
> > struct input_dev *input_dev;
> > +   bool suspended;
> >  };
> >  
> >  
> > @@ -235,9 +236,55 @@ static const struct acpi_device_id rbtn_ids[] = {
> > { "", 0 },
> >  };
> >  
> > +#ifdef CONFIG_PM_SLEEP
> > +static void ACPI_SYSTEM_XFACE rbtn_acpi_clear_flag(void *context)

I would rename this function to rbtn_clear_suspended_flag.

> > +{
> > +   struct rbtn_data *rbtn_data = context;
> > +
> > +   rbtn_data->suspended = false;
> > +}
> > +
> > +static int rbtn_suspend(struct device *dev)
> > +{
> > +   struct acpi_device *device = to_acpi_device(dev);
> > +   struct rbtn_data *rbtn_data = acpi_driver_data(device);
> > +
> > +   rbtn_data->suspended = true;
> > +
> > +   return 0;
> > +}
> > +
> > +static int rbtn_resume(struct device *dev)
> > +{
> > +   struct acpi_device *device = to_acpi_device(dev);
> > +   struct rbtn_data *rbtn_data = acpi_driver_data(device);
> > +   acpi_status status;
> > +
> > +   /*
> > +* Upon resume, some BIOSes autonomously send an ACPI notification
> > +* that triggers an unwanted input event. In order to ignore it,
> > +* we use a flag that we set at suspend and clear once we have
> > +* received the extra notification. Since ACPI notifications are
> > +* delivered asynchronously to drivers, we clear the flag from the
> > +* workqueue used to deliver the notifications. This should be enough
> > +* to guarantee that the flag is cleared only after we received the
> > +* extra notification, if any.
> > +*/
> 
> "guarantee" is rather strong word here. We really do not know anything
> how and when these notifications are generated by firmware, so can only
> hope. But otherwise this explains what this patch intends to do (so that
> even me finally understood it :)

Yes, thats better.

> > +   status = acpi_os_execute(OSL_NOTIFY_HANDLER,
> > +rbtn_acpi_clear_flag, rbtn_data);
> > +   if (ACPI_FAILURE(status))
> > +   rbtn_data->suspended = false;

And here rbtn_clear_suspended_flag(rbtn_data) call instead direct
assignment.

> > +
> > +   return 0;
> > +}
> > +#endif
> > +
> > +static SIMPLE_DEV_PM_OPS(rbtn_pm_ops, rbtn_suspend, rbtn_resume);
> > +
> >  static struct acpi_driver rbtn_driver = {
> > .name = "dell-rbtn",
> > .ids = rbtn_ids,
> > +   .drv.pm = &rbtn_pm_ops,
> > .ops = {
> > .add = rbtn_add,
> > .remove = rbtn_remove,
> > @@ -399,6 +446,15 @@ static void rbtn_notify(struct acpi_device *device, 
> > u32 event)
> >  {
> > struct rbtn_data *rbtn_data = device->driver_data;
> >  
> > +   /*
> > +* Some BIOSes send autonomously a notification at resume.
> > +* Ignore it to prevent unwanted input events.
> > +*/
> > +   if (rbtn_data->suspended) {
> > +   dev_dbg(&device->dev, "ACPI notification ignored\n");
> > +   return;
> > +   }
> > +
> > if (event != 0x80) {
> > dev_info(&device->dev, "Received unknown event (0x%x)\n",
> >  event);
> > 
> 

-- 
Pali Rohár
pali.ro...@gmail.com

Re: [PATCH v3] KVM: halt-polling: poll if emulated lapic timer will fire soon

2016-05-24 Thread Wanpeng Li

2016-05-24 14:59 GMT+08:00 Christian Borntraeger :
> On 05/24/2016 04:25 AM, Wanpeng Li wrote:
>> 2016-05-24 10:19 GMT+08:00 Wanpeng Li :
>>> 2016-05-24 2:01 GMT+08:00 David Matlack :
 On Sun, May 22, 2016 at 5:42 PM, Wanpeng Li  wrote:
> From: Wanpeng Li 

 I'm ok with this patch, but I'd like to better understand the target
 workloads. What type of workloads do you expect to benefit from this?
>>>
>>> dynticks guests I think is one of workloads which can get benefit,
>>> there are lots of upcoming fire timers captured by my feature. Even
>>> during TCP testing. And also the workload of Yang's.
>>
>> Do you think I should add an module parameter to enable/disable it
>> during module insmod or current patch is fine?
>
> What about getting rid of this hunk
>
> -   val = 1;
> +   val = halt_poll_ns_base;
>
>
> and then rename "halt_poll_ns_base" into "halt_poll_ns_timer" that
> can be changed as module parameter?

Good point,

>
>
> I also experimented with an s390 implementation, which seems pretty 
> straightforward.
> It is probably something like the following (whitespace damaged due to 
> pcopy/paste)
> and needs more testing.
>
> diff --git a/arch/s390/include/asm/kvm_host.h 
> b/arch/s390/include/asm/kvm_host.h
> index 38bbc98..a97739d 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -682,6 +682,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
>  struct kvm_async_pf *work);
>
>  extern int sie64a(struct kvm_s390_sie_block *, u64 *);
> +extern u64 kvm_s390_timer_remaining(struct kvm_vcpu *vcpu);
>  extern char sie_exit;
>
>  static inline void kvm_arch_hardware_disable(void) {}
> @@ -699,7 +700,7 @@ static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu 
> *vcpu) {}
>  static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
>  static inline u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu)
>  {
> -   return -1ULL;
> +   return kvm_s390_timer_remaining(vcpu);
>  }
>
>  void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu);
> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
> index 5a80af7..5b209a2 100644
> --- a/arch/s390/kvm/interrupt.c
> +++ b/arch/s390/kvm/interrupt.c
> @@ -936,6 +936,17 @@ static u64 __calculate_sltime(struct kvm_vcpu *vcpu)
> return sltime;
>  }
>
> +
> +u64 kvm_s390_timer_remaining(struct kvm_vcpu *vcpu)
> +{
> +   u64 result;
> +
> +   preempt_disable();
> +   result = __calculate_sltime(vcpu);
> +   preempt_enable();
> +   return result;
> +}
> +
>  int kvm_s390_handle_wait(struct kvm_vcpu *vcpu)
>  {
> u64 sltime;
>



-- 
Regards,
Wanpeng Li

Re: [PATCH 03/16] sched/fair: Disregard idle task wakee_flips in wake_wide

2016-05-24 Thread Yuyang Du

On Mon, May 23, 2016 at 05:42:20PM +0200, Mike Galbraith wrote:
> On Mon, 2016-05-23 at 15:10 +0100, Morten Rasmussen wrote:
> > On Mon, May 23, 2016 at 03:00:46PM +0200, Mike Galbraith wrote:
> > > On Mon, 2016-05-23 at 13:00 +0100, Morten Rasmussen wrote:
> > > 
> > > > The problem then seems to be distinguishing truly idle and busy doing
> > > > interrupts. The issue that I observe is that wake_wide() likes pushing
> > > > tasks around in lightly scenarios which isn't desirable for power
> > > > management. Selecting the same cpu again may potentially let others
> > > > reach deeper C-state.
> > > > 
> > > > With that in mind I will if I can do better. Suggestions are welcome :-)
> > > 
> > > None here.  For big boxen that are highly idle, you'd likely want to
> > > shut down nodes and consolidate load, but otoh, all that slows response
> > > to burst, which I hate.  I prefer race to idle, let power gating do its
> > > job.  If I had a server farm with enough capacity vs load variability
> > > to worry about, I suspect I'd become highly interested in routing.
> > 
> > I don't disagree for systems of that scale, but at the other end of the
> > spectrum it is a single SoC we are trying squeeze the best possible
> > mileage out of. That implies optimizing for power gating to reach deeper
> > C-states when possible by consolidating idle-time and grouping
> > idle cpus. Migrating task unnecessarily isn't helping us in achieving
> > that, unfortunately :-(
> 
> Yup, the goals are pretty much mutually exclusive.  For your goal, you
> want more of an allocator like behavior, where stacking of tasks is bad
> only once there's too much overlap (ie latency, defining is hard), and
> allocation always has the same order (expand rightward or such for the
> general case, adding little/big complexity for arm).  For mine, current
> behavior is good, avoid stacking like the plague.

I'd be happy to have a switch to either one goal.

Re: [PATCH v3] KVM: halt-polling: poll if emulated lapic timer will fire soon

2016-05-24 Thread Wanpeng Li

2016-05-24 14:59 GMT+08:00 Christian Borntraeger :
> On 05/24/2016 04:25 AM, Wanpeng Li wrote:
>> 2016-05-24 10:19 GMT+08:00 Wanpeng Li :
>>> 2016-05-24 2:01 GMT+08:00 David Matlack :
 On Sun, May 22, 2016 at 5:42 PM, Wanpeng Li  wrote:
> From: Wanpeng Li 

 I'm ok with this patch, but I'd like to better understand the target
 workloads. What type of workloads do you expect to benefit from this?
>>>
>>> dynticks guests I think is one of workloads which can get benefit,
>>> there are lots of upcoming fire timers captured by my feature. Even
>>> during TCP testing. And also the workload of Yang's.
>>
>> Do you think I should add an module parameter to enable/disable it
>> during module insmod or current patch is fine?
>
> What about getting rid of this hunk
>
> -   val = 1;
> +   val = halt_poll_ns_base;
>
>
> and then rename "halt_poll_ns_base" into "halt_poll_ns_timer" that
> can be changed as module parameter?

Good idea, actually I remember Paolo mentioned to change this as an
module parameter in another thread.

>
> I also experimented with an s390 implementation, which seems pretty 
> straightforward.
> It is probably something like the following (whitespace damaged due to 
> pcopy/paste)
> and needs more testing.

Great work, Christian. I will send out a new version w/ module parameter.

Regards,
Wanpeng Li

Re: [PATCH 09/16] sched/fair: Let asymmetric cpu configurations balance at wake-up

2016-05-24 Thread Mike Galbraith

On Tue, 2016-05-24 at 09:03 +0200, Mike Galbraith wrote:

> This doesn't look like it's restricted to big/little setups, so could
> overrule wake_wide() wanting to NAK a x-node pull.

Bah, nevermind.

Re: zone_reclaimable() leads to livelock in __alloc_pages_slowpath()

2016-05-24 Thread Michal Hocko

On Mon 23-05-16 17:14:19, Oleg Nesterov wrote:
> On 05/23, Michal Hocko wrote:
[...]
> > Could you add some tracing and see what are the numbers
> > above?
> 
> with the patch below I can press Ctrl-C when it hangs, this breaks the
> endless loop and the output looks like
> 
>   vmscan: ZONE=8189f180 0 scanned=0 pages=6
>   vmscan: ZONE=8189eb00 0 scanned=1 pages=0
>   ...
>   vmscan: ZONE=8189eb00 0 scanned=2 pages=1
>   vmscan: ZONE=8189f180 0 scanned=4 pages=6
>   ...
>   vmscan: ZONE=8189f180 0 scanned=4 pages=6
>   vmscan: ZONE=8189f180 0 scanned=4 pages=6
> 
> the numbers are always small.

Small but scanned is not 0 and constant which means it either gets reset
repeatedly (something gets freed) or we have stopped scanning. Which
pattern can you see? I assume that the swap space is full at the time
(could you add get_nr_swap_pages() to the output). Also zone->name would
be better than the pointer.

I am trying to reproduce but your test case always hits the oom killer:

This is in a qemu x86_64 virtual machine:
# free
 total   used   free sharedbuffers cached
Mem:490212  96788 393424  0   3196   9976
-/+ buffers/cache:  83616 406596
Swap:   138236  57740  80496

I have tried with much larger swap space but no change except for the
run time of the test which is expected.

# grep "^processor" /proc/cpuinfo | wc -l
1

[... Skipped several previous attempts ...]
[  695.215235] vmscan: XXX: zone:DMA32 nr_pages_scanned:0 reclaimable:20
[  695.215245] vmscan: XXX: zone:DMA32 nr_pages_scanned:0 reclaimable:20
[  695.215255] vmscan: XXX: zone:DMA32 nr_pages_scanned:0 reclaimable:20
[  695.215282] vmscan: XXX: zone:DMA32 nr_pages_scanned:1 reclaimable:27
[  695.215303] vmscan: XXX: zone:DMA32 nr_pages_scanned:5 reclaimable:27
[  695.215327] vmscan: XXX: zone:DMA32 nr_pages_scanned:18 reclaimable:27
[  695.215351] vmscan: XXX: zone:DMA32 nr_pages_scanned:45 reclaimable:27
[  695.215362] vmscan: XXX: zone:DMA32 nr_pages_scanned:45 reclaimable:27
[  695.215373] vmscan: XXX: zone:DMA32 nr_pages_scanned:45 reclaimable:27
[  695.215382] vmscan: XXX: zone:DMA32 nr_pages_scanned:45 reclaimable:27
[  695.215392] vmscan: XXX: zone:DMA32 nr_pages_scanned:45 reclaimable:27
[  695.215402] vmscan: XXX: zone:DMA32 nr_pages_scanned:45 reclaimable:27
[  695.215412] vmscan: XXX: zone:DMA32 nr_pages_scanned:45 reclaimable:27
[  695.215422] vmscan: XXX: zone:DMA32 nr_pages_scanned:45 reclaimable:27
[  695.215431] vmscan: XXX: zone:DMA32 nr_pages_scanned:45 reclaimable:27
[  695.215442] vmscan: XXX: zone:DMA32 nr_pages_scanned:46 reclaimable:27
[  695.215462] vmscan: XXX: zone:DMA32 nr_pages_scanned:48 reclaimable:27
[  695.215482] vmscan: XXX: zone:DMA32 nr_pages_scanned:53 reclaimable:27
[  695.215504] vmscan: XXX: zone:DMA32 nr_pages_scanned:63 reclaimable:27
[  695.215528] vmscan: XXX: zone:DMA32 nr_pages_scanned:90 reclaimable:27
[...]
[  695.215620] vmscan: XXX: zone:DMA32 nr_pages_scanned:91 reclaimable:27
[  695.215640] vmscan: XXX: zone:DMA32 nr_pages_scanned:94 reclaimable:27
[  695.215659] vmscan: XXX: zone:DMA32 nr_pages_scanned:100 reclaimable:27
[  695.215683] vmscan: XXX: zone:DMA32 nr_pages_scanned:113 reclaimable:27
[...]
[  695.215786] vmscan: XXX: zone:DMA32 nr_pages_scanned:140 reclaimable:27
[  695.215797] vmscan: XXX: zone:DMA32 nr_pages_scanned:141 reclaimable:27
[  695.215816] vmscan: XXX: zone:DMA32 nr_pages_scanned:144 reclaimable:27
[  695.215836] vmscan: XXX: zone:DMA32 nr_pages_scanned:150 reclaimable:27
[  695.215906] test-oleg invoked oom-killer: 
gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), order=0, oom_score_adj=0
-- 
Michal Hocko
SUSE Labs

[PATCH v2 2/5] perf config: Reimplement perf_config() using perf_config_set__iter()

2016-05-24 Thread Taeung Song

Everytime perf_config() is called, perf_config() always read config files.
(i.e. user config '~/.perfconfig' and system config '$(sysconfdir)/perfconfig')

But we need to use config set that already contains all config
key-value pairs to avoid this repetitive work reading the config files
in perf_config().

In other words, if new perf_config() is called,
only first time 'config_set' is initialized collecting all configs
from the config files and it work with perf_config_set__iter().

If we do, what old perf_config() handle is the same as new perf_config() work
without the repetitive work that read the config files.

Cc: Namhyung Kim 
Cc: Jiri Olsa 
Cc: Wang Nan 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Masami Hiramatsu 
Cc: Alexander Shishkin 
Signed-off-by: Taeung Song 
---
 tools/perf/util/config.c | 98 
 1 file changed, 50 insertions(+), 48 deletions(-)

diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c
index 5d01899..487d390 100644
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
@@ -28,6 +28,7 @@ static int config_linenr;
 static int config_file_eof;
 
 const char *config_exclusive_filename;
+struct perf_config_set *config_set;
 
 static int get_next_char(void)
 {
@@ -477,54 +478,6 @@ static int perf_config_global(void)
return !perf_env_bool("PERF_CONFIG_NOGLOBAL", 0);
 }
 
-int perf_config(config_fn_t fn, void *data)
-{
-   int ret = 0, found = 0;
-   const char *home = NULL;
-
-   /* Setting $PERF_CONFIG makes perf read _only_ the given config file. */
-   if (config_exclusive_filename)
-   return perf_config_from_file(fn, config_exclusive_filename, 
data);
-   if (perf_config_system() && !access(perf_etc_perfconfig(), R_OK)) {
-   ret += perf_config_from_file(fn, perf_etc_perfconfig(),
-   data);
-   found += 1;
-   }
-
-   home = getenv("HOME");
-   if (perf_config_global() && home) {
-   char *user_config = strdup(mkpath("%s/.perfconfig", home));
-   struct stat st;
-
-   if (user_config == NULL) {
-   warning("Not enough memory to process %s/.perfconfig, "
-   "ignoring it.", home);
-   goto out;
-   }
-
-   if (stat(user_config, &st) < 0)
-   goto out_free;
-
-   if (st.st_uid && (st.st_uid != geteuid())) {
-   warning("File %s not owned by current user or root, "
-   "ignoring it.", user_config);
-   goto out_free;
-   }
-
-   if (!st.st_size)
-   goto out_free;
-
-   ret += perf_config_from_file(fn, user_config, data);
-   found += 1;
-out_free:
-   free(user_config);
-   }
-out:
-   if (found == 0)
-   return -1;
-   return ret;
-}
-
 static struct perf_config_section *find_section(struct list_head *sections,
const char *section_name)
 {
@@ -705,6 +658,55 @@ struct perf_config_set *perf_config_set__new(void)
return set;
 }
 
+static int perf_config_set__check(void)
+{
+   if (config_set != NULL)
+   return 0;
+
+   config_set = perf_config_set__new();
+   if (!config_set)
+   return -1;
+
+   return 0;
+}
+
+static int perf_config_set__iter(struct perf_config_set *set, config_fn_t fn, 
void *data)
+{
+   struct perf_config_section *section;
+   struct perf_config_item *item;
+   struct list_head *sections;
+   char key[BUFSIZ];
+
+   if (set == NULL)
+   return -1;
+
+   sections = &set->sections;
+   if (list_empty(sections))
+   return -1;
+
+   list_for_each_entry(section, sections, node) {
+   list_for_each_entry(item, §ion->items, node) {
+   char *value = item->value;
+
+   if (value) {
+   scnprintf(key, sizeof(key), "%s.%s",
+ section->name, item->name);
+   if (fn(key, value, data) < 0)
+   return -1;
+   }
+   }
+   }
+
+   return 0;
+}
+
+int perf_config(config_fn_t fn, void *data)
+{
+   if (perf_config_set__check() < 0)
+   return -1;
+   return perf_config_set__iter(config_set, fn, data);
+}
+
 static void perf_config_item__delete(struct perf_config_item *item)
 {
zfree(&item->name);
-- 
2.5.0

[PATCH v2 3/5] perf config: Modify perf_config_set__delete() using global variable 'config_set'

2016-05-24 Thread Taeung Song

This function deleted allocated config set but
the global variable 'config_set' is used all around
so this directly remove 'config_set' instead of using local variable 'set'.

Cc: Namhyung Kim 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Alexander Shishkin 
Signed-off-by: Taeung Song 
---
 tools/perf/builtin-config.c | 2 +-
 tools/perf/util/config.c| 8 
 tools/perf/util/config.h| 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-config.c b/tools/perf/builtin-config.c
index fe1b77f..8eef3fb 100644
--- a/tools/perf/builtin-config.c
+++ b/tools/perf/builtin-config.c
@@ -106,7 +106,7 @@ int cmd_config(int argc, const char **argv, const char 
*prefix __maybe_unused)
usage_with_options(config_usage, config_options);
}
 
-   perf_config_set__delete(set);
+   perf_config_set__delete();
 out_err:
return ret;
 }
diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c
index 487d390..abfe1b2 100644
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
@@ -594,7 +594,7 @@ static int collect_config(const char *var, const char 
*value,
 
 out_free:
free(key);
-   perf_config_set__delete(set);
+   perf_config_set__delete();
return -1;
 }
 
@@ -741,10 +741,10 @@ static void perf_config_set__purge(struct perf_config_set 
*set)
}
 }
 
-void perf_config_set__delete(struct perf_config_set *set)
+void perf_config_set__delete(void)
 {
-   perf_config_set__purge(set);
-   free(set);
+   perf_config_set__purge(config_set);
+   zfree(&config_set);
 }
 
 /*
diff --git a/tools/perf/util/config.h b/tools/perf/util/config.h
index 22ec626..be4e603 100644
--- a/tools/perf/util/config.h
+++ b/tools/perf/util/config.h
@@ -21,6 +21,6 @@ struct perf_config_set {
 };
 
 struct perf_config_set *perf_config_set__new(void);
-void perf_config_set__delete(struct perf_config_set *set);
+void perf_config_set__delete(void);
 
 #endif /* __PERF_CONFIG_H */
-- 
2.5.0

[PATCH v2 1/5] perf config: Use new perf_config_set__init() to initialize config set

2016-05-24 Thread Taeung Song

Instead of perf_config(), This function initialize config set
collecting all configs from config files (i.e. user config
~/.perfconfig and system config $(sysconfdir)/perfconfig).

If there are the same config variable both user and system
config file, user config has higher priority than system config.

Cc: Namhyung Kim 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Alexander Shishkin 
Signed-off-by: Taeung Song 
---
 tools/perf/util/config.c | 50 +++-
 1 file changed, 49 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c
index dad7d82..5d01899 100644
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
@@ -645,13 +645,61 @@ out_free:
return -1;
 }
 
+static int perf_config_set__init(struct perf_config_set *set)
+{
+   int ret = 0, found = 0;
+   const char *home = NULL;
+
+   /* Setting $PERF_CONFIG makes perf read _only_ the given config file. */
+   if (config_exclusive_filename)
+   return perf_config_from_file(collect_config, 
config_exclusive_filename, set);
+   if (perf_config_system() && !access(perf_etc_perfconfig(), R_OK)) {
+   ret += perf_config_from_file(collect_config, 
perf_etc_perfconfig(), set);
+   found += 1;
+   }
+
+   home = getenv("HOME");
+   if (perf_config_global() && home) {
+   char *user_config = strdup(mkpath("%s/.perfconfig", home));
+   struct stat st;
+
+   if (user_config == NULL) {
+   warning("Not enough memory to process %s/.perfconfig, "
+   "ignoring it.", home);
+   goto out;
+   }
+
+   if (stat(user_config, &st) < 0)
+   goto out_free;
+
+   if (st.st_uid && (st.st_uid != geteuid())) {
+   warning("File %s not owned by current user or root, "
+   "ignoring it.", user_config);
+   goto out_free;
+   }
+
+   if (!st.st_size)
+   goto out_free;
+
+   ret += perf_config_from_file(collect_config, user_config, set);
+   found += 1;
+out_free:
+   free(user_config);
+   }
+out:
+   if (found == 0)
+   return -1;
+   return ret;
+}
+
 struct perf_config_set *perf_config_set__new(void)
 {
struct perf_config_set *set = zalloc(sizeof(*set));
 
if (set) {
INIT_LIST_HEAD(&set->sections);
-   perf_config(collect_config, set);
+   if (perf_config_set__init(set) < 0)
+   return NULL;
}
 
return set;
-- 
2.5.0

[PATCH v2 5/5] perf config: Reset config set at only 'config' sub-command

2016-05-24 Thread Taeung Song

When first calling perf_config(), config set is
initialized but 'config' sub-command need to reset
config set because of '--user' or '--system' options.

The options of 'config' sub-command is to select
a particular config file location so the config set
should be reinitialized collecting configs from
selected exclusive config file.

Cc: Namhyung Kim 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Alexander Shishkin 
Signed-off-by: Taeung Song 
---
 tools/perf/builtin-config.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-config.c b/tools/perf/builtin-config.c
index 4a61411..dc5b52f 100644
--- a/tools/perf/builtin-config.c
+++ b/tools/perf/builtin-config.c
@@ -47,7 +47,6 @@ static int show_config(const char *key, const char *value,
 int cmd_config(int argc, const char **argv, const char *prefix __maybe_unused)
 {
int ret = 0;
-   struct perf_config_set *set;
char *user_config = mkpath("%s/.perfconfig", getenv("HOME"));
 
argc = parse_options(argc, argv, config_options, config_usage,
@@ -65,11 +64,11 @@ int cmd_config(int argc, const char **argv, const char 
*prefix __maybe_unused)
else if (use_user_config)
config_exclusive_filename = user_config;
 
-   set = perf_config_set__new();
-   if (!set) {
-   ret = -1;
-   goto out_err;
-   }
+   /*
+* Reset config set at only 'config' sub-command
+* because of options config file location.
+*/
+   perf_config_set__delete();
 
switch (actions) {
case ACTION_LIST:
@@ -92,6 +91,5 @@ int cmd_config(int argc, const char **argv, const char 
*prefix __maybe_unused)
}
 
perf_config_set__delete();
-out_err:
return ret;
 }
-- 
2.5.0

[RFC][PATCH v2 0/5] perf config: Reimplement perf_config() using perf_config_set__inter()

2016-05-24 Thread Taeung Song

Everytime perf_config() is called, perf_config() always read config files.
(i.e. user config '~/.perfconfig' and system config '$(sysconfdir)/perfconfig')

But we need to use 'struct perf_config_set config_set' variable
that already contains all config key-value pairs
to avoid this repetitive work in perf_config().

In other words, if new perf_config() is called,
only first time 'config_set' is initialized
collecting all configs from config files and it work with 
perf_config_set__iter().

If we do, what old perf_config() handle is the same as new perf_config() work
without the repetitive work that read config files.

IMHO, I think this patchset is needed because not only the repetitive work
should be avoided but also in near future, it would be smooth to manage perf 
configs.

If you give me any feedback, I'd apprecicated it. :)

Thanks,
Taeung

v2:
- split a patch into several patches
- reimplement show_config() using new perf_config()
- modify perf_config_set__delete using global variable 'config_set'
- reset config set when only 'config' sub-commaned work
  because of options for config file location

Taeung Song (5):
  perf config: Use new perf_config_set__init() to initialize config set
  perf config: Reimplement perf_config() using perf_config_set__iter()
  perf config: Modify perf_config_set__delete() using global variable
'config_set'
  perf config: Reimplement show_config() using perf_config()
  perf config: Reset config set at only 'config' sub-command

 tools/perf/builtin-config.c |  43 
 tools/perf/util/config.c| 156 +---
 tools/perf/util/config.h|   2 +-
 3 files changed, 117 insertions(+), 84 deletions(-)

-- 
2.5.0

[PATCH v2 4/5] perf config: Reimplement show_config() using perf_config()

2016-05-24 Thread Taeung Song

Old show_config() directly use config set so
there are many duplicated code with perf_config_set__iter().

So reimplement show_config() using perf_config() that use
perf_config_set__iter() with config set that already
contains all configs.

Cc: Namhyung Kim 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Alexander Shishkin 
Signed-off-by: Taeung Song 
---
 tools/perf/builtin-config.c | 29 +++--
 1 file changed, 7 insertions(+), 22 deletions(-)

diff --git a/tools/perf/builtin-config.c b/tools/perf/builtin-config.c
index 8eef3fb..4a61411 100644
--- a/tools/perf/builtin-config.c
+++ b/tools/perf/builtin-config.c
@@ -33,28 +33,13 @@ static struct option config_options[] = {
OPT_END()
 };
 
-static int show_config(struct perf_config_set *set)
+static int show_config(const char *key, const char *value,
+  void *cb __maybe_unused)
 {
-   struct perf_config_section *section;
-   struct perf_config_item *item;
-   struct list_head *sections;
-
-   if (set == NULL)
-   return -1;
-
-   sections = &set->sections;
-   if (list_empty(sections))
-   return -1;
-
-   list_for_each_entry(section, sections, node) {
-   list_for_each_entry(item, §ion->items, node) {
-   char *value = item->value;
-
-   if (value)
-   printf("%s.%s=%s\n", section->name,
-  item->name, value);
-   }
-   }
+   if (value)
+   printf("%s=%s\n", key, value);
+   else
+   printf("%s\n", key);
 
return 0;
 }
@@ -92,7 +77,7 @@ int cmd_config(int argc, const char **argv, const char 
*prefix __maybe_unused)
pr_err("Error: takes no arguments\n");
parse_options_usage(config_usage, config_options, "l", 
1);
} else {
-   ret = show_config(set);
+   ret = perf_config(show_config, NULL);
if (ret < 0) {
const char * config_filename = 
config_exclusive_filename;
if (!config_exclusive_filename)
-- 
2.5.0

Re: [PATCH] drm/rockchip: Return -EBUSY if there's already a pending flip event v3

2016-05-24 Thread Daniel Stone

Hi Tomeu,

On 5 April 2016 at 16:07, Tomeu Vizoso  wrote:
> On 4 April 2016 at 17:44, Daniel Stone  wrote:
>> On 4 April 2016 at 14:55, Tomeu Vizoso  wrote:
>>> +   if (async) {
>>> +   for_each_crtc_in_state(state, crtc, crtc_state, i) {
>>> +   if (crtc->state->event ||
>>> +   rockchip_drm_crtc_has_pending_event(crtc)) {
>>> +   return -EBUSY;
>>> +   }
>>> +   }
>>> +   }
>>
>> Hmmm ...
>>
>> Doesn't this trigger before the VOP atomic_begin() helper, meaning
>> that anything with an event set will trigger the check? Seems like it
>> should be && rather than ||.
>
> So, these are the two cases that this code aims to handle:
>
> 1. A previous request with an event set hasn't progressed to
> atomic_begin yet, so the event field in drm_crtc_state (at this point,
> the old state) is still populated but vop->event still isn't.

Ah right, this was what I was missing: the async (non-blocking)
implementation. Sounds good to me then.

Cheers,
Daniel

RE: [PATCH V7 00/11] Support for generic ACPI based PCI host controller

2016-05-24 Thread Gabriele Paoloni

Hi Bjorn

> -Original Message-
> From: Bjorn Helgaas [mailto:helg...@kernel.org]
> Sent: 24 May 2016 00:39
> To: Gabriele Paoloni
> Cc: Lorenzo Pieralisi; Ard Biesheuvel; Jon Masters; Tomasz Nowicki;
> a...@arndb.de; will.dea...@arm.com; catalin.mari...@arm.com;
> raf...@kernel.org; hanjun@linaro.org; ok...@codeaurora.org;
> jchan...@broadcom.com; linaro-a...@lists.linaro.org; linux-
> p...@vger.kernel.org; dhd...@apm.com; liviu.du...@arm.com;
> dda...@caviumnetworks.com; jeremy.lin...@arm.com; linux-
> ker...@vger.kernel.org; linux-a...@vger.kernel.org;
> robert.rich...@caviumnetworks.com; suravee.suthikulpa...@amd.com;
> msal...@redhat.com; Wangyijing; m...@semihalf.com;
> andrea.ga...@linaro.org; linux-arm-ker...@lists.infradead.org
> Subject: Re: [PATCH V7 00/11] Support for generic ACPI based PCI host
> controller
> 
> On Mon, May 23, 2016 at 03:16:01PM +, Gabriele Paoloni wrote:
> > Hi Lorenzo
> >
> > > -Original Message-
> > > From: Lorenzo Pieralisi [mailto:lorenzo.pieral...@arm.com]
> > > Sent: 23 May 2016 11:57
> > > To: Ard Biesheuvel
> > > Cc: Gabriele Paoloni; Jon Masters; Tomasz Nowicki;
> helg...@kernel.org;
> > > a...@arndb.de; will.dea...@arm.com; catalin.mari...@arm.com;
> > > raf...@kernel.org; hanjun@linaro.org; ok...@codeaurora.org;
> > > jchan...@broadcom.com; linaro-a...@lists.linaro.org; linux-
> > > p...@vger.kernel.org; dhd...@apm.com; liviu.du...@arm.com;
> > > dda...@caviumnetworks.com; jeremy.lin...@arm.com; linux-
> > > ker...@vger.kernel.org; linux-a...@vger.kernel.org;
> > > robert.rich...@caviumnetworks.com; suravee.suthikulpa...@amd.com;
> > > msal...@redhat.com; Wangyijing; m...@semihalf.com;
> > > andrea.ga...@linaro.org; linux-arm-ker...@lists.infradead.org
> > > Subject: Re: [PATCH V7 00/11] Support for generic ACPI based PCI
> host
> > > controller
> > >
> > > On Fri, May 20, 2016 at 11:14:03AM +0200, Ard Biesheuvel wrote:
> > > > On 20 May 2016 at 10:40, Gabriele Paoloni
> > >  wrote:
> > > > > Hi Ard
> > > > >
> > > > >> -Original Message-
> > > > >> From: Ard Biesheuvel [mailto:ard.biesheu...@linaro.org]
> > > > [...]
> > > > >>
> > > > >> Is the PCIe root complex so special that you cannot simply
> > > describe an
> > > > >> implementation that is not PNP0408 compatible as something
> else,
> > > under
> > > > >> its own unique HID? If everybody is onboard with using ACPI,
> how
> > > is
> > > > >> this any different from describing other parts of the platform
> > > > >> topology? Even if the SBSA mandates generic PCI, they already
> > > deviated
> > > > >> from that when they built the hardware, so pretending that it
> is a
> > > > >> PNP0408 with quirks really does not buy us anything.
> > > > >
> > > > > From my understanding we want to avoid this as this would allow
> > > each
> > > > > vendor to come up with his own code and it would be much more
> > > effort
> > > > > for the PCI maintainer to rework the PCI framework to
> accommodate
> > > X86
> > > > > and "all" ARM64 Host Controllers...
> > > > >
> > > > > I guess this approach is too risky and we want to avoid this.
> > > Through
> > > > > standardization we can more easily maintain the code and scale
> it
> > > to
> > > > > multiple SoCs...
> > > > >
> > > > > So this is my understanding; maybe Jon, Tomasz or Lorenzo can
> give
> > > > > a bit more explanation...
> > > > >
> > > >
> > > > OK, so that boils down to recommending to vendors to represent
> known
> > > > non-compliant hardware as compliant, just so that we don't have
> to
> > > > change the code to support additional flavors of ECAM ? It's fine
> to
> > > > be pragmatic, but that sucks.
> > > >
> > > > We keep confusing the x86 case with the ARM case here: for x86,
> they
> > > > needed to deal with broken hardware *after* the fact, and all
> they
> > > > could do is find /some/ distinguishing feature in order to guess
> > > which
> > > > exact hardware they might be running on. For arm64, it is the
> > > opposite
> > > > case. We are currently in a position where we can demand vendors
> to
> > > > comply with the standards they endorsed themselves, and (ab)using
> > > ACPI
> > > > + DMI as a de facto platform description rather than plain ACPI
> makes
> > > > me think the DT crowd were actually right from the beginning. It
> > > > *directly* violates the standardization principle, since it
> requires
> > > a
> > > > priori knowledge inside the OS that a certain 'generic' device
> must
> > > be
> > > > driven in a special way.
> > > >
> > > > So can anyone comment on the feasibility of adding support for
> > > devices
> > > > with vendor specific HIDs (and no generic CIDs) to the current
> ACPI
> > > > ECAM driver in Linux?
> 
> I don't think of ECAM support itself as a "driver".  It's just a
> service available to drivers, similar to OF resource parsing.
> 
> Per PCI Firmware r3.2, sec 4.1.5, "PNP0A03" means a PCI/PCI-X/PCIe
> host bridge.  "PNP0A08" means a PCI-X Mode 2 or PCIe bridge that
> supports ext

[PATCH] drm/rockchip: Return -EBUSY if there's already a pending flip event v4

2016-05-24 Thread Tomeu Vizoso

As per the docs, atomic_commit should return -EBUSY "if an asycnhronous
updated is requested and there is an earlier updated pending".

v2: Use the status of the workqueue instead of vop->event, and don't add
a superfluous wait on the workqueue.

v3: Drop work_busy, as there's a sizeable delay when the worker
finishes, which introduces a race in which the client has already
received the last flip event but the next page flip ioctl will still
return -EBUSY because work_busy returns outdated information.

v4: Hold dev->event_lock while checking the VOP's event field as
suggested by Daniel Stone.

v5: Only block if there's outstanding work if it's a blocking call.

Signed-off-by: Tomeu Vizoso 
---
 drivers/gpu/drm/rockchip/rockchip_drm_drv.h |  1 +
 drivers/gpu/drm/rockchip/rockchip_drm_fb.c  | 25 ++---
 drivers/gpu/drm/rockchip/rockchip_drm_vop.c |  6 ++
 3 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_drv.h 
b/drivers/gpu/drm/rockchip/rockchip_drm_drv.h
index 56f43a364c7f..0b46617decd9 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_drv.h
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_drv.h
@@ -76,6 +76,7 @@ void rockchip_drm_atomic_work(struct work_struct *work);
 int rockchip_register_crtc_funcs(struct drm_crtc *crtc,
 const struct rockchip_crtc_funcs *crtc_funcs);
 void rockchip_unregister_crtc_funcs(struct drm_crtc *crtc);
+bool rockchip_drm_crtc_has_pending_event(struct drm_crtc *crtc);
 int rockchip_drm_dma_attach_device(struct drm_device *drm_dev,
   struct device *dev);
 void rockchip_drm_dma_detach_device(struct drm_device *drm_dev,
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_fb.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
index 755cfdba61cd..e9531353b8d2 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
@@ -279,15 +279,34 @@ int rockchip_drm_atomic_commit(struct drm_device *dev,
 {
struct rockchip_drm_private *private = dev->dev_private;
struct rockchip_atomic_commit *commit = &private->commit;
-   int ret;
+   struct drm_crtc_state *crtc_state;
+   struct drm_crtc *crtc;
+   unsigned long flags;
+   int i, ret;
+
+   if (nonblock) {
+   for_each_crtc_in_state(state, crtc, crtc_state, i) {
+   spin_lock_irqsave(&dev->event_lock, flags);
+
+   if (crtc->state->event ||
+   rockchip_drm_crtc_has_pending_event(crtc)) {
+   spin_unlock_irqrestore(&dev->event_lock, flags);
+   return -EBUSY;
+   }
+
+   spin_unlock_irqrestore(&dev->event_lock, flags);
+   }
+   }
 
ret = drm_atomic_helper_prepare_planes(dev, state);
if (ret)
return ret;
 
-   /* serialize outstanding nonblocking commits */
mutex_lock(&commit->lock);
-   flush_work(&commit->work);
+
+   /* serialize outstanding nonblocking commits */
+   if (!nonblock)
+   flush_work(&commit->work);
 
drm_atomic_helper_swap_state(dev, state);
 
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
index 1c4d5b5a70a2..3f980f52c640 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
@@ -836,6 +836,12 @@ static const struct drm_plane_funcs vop_plane_funcs = {
.atomic_destroy_state = vop_atomic_plane_destroy_state,
 };
 
+bool rockchip_drm_crtc_has_pending_event(struct drm_crtc *crtc)
+{
+   assert_spin_locked(&crtc->dev->event_lock);
+   return to_vop(crtc)->event;
+}
+
 static int vop_crtc_enable_vblank(struct drm_crtc *crtc)
 {
struct vop *vop = to_vop(crtc);
-- 
2.5.5

[PATCH] drm/rockchip: Return -EBUSY if there's already a pending flip event v5

2016-05-24 Thread Tomeu Vizoso

As per the docs, atomic_commit should return -EBUSY "if an asycnhronous
updated is requested and there is an earlier updated pending".

v2: Use the status of the workqueue instead of vop->event, and don't add
a superfluous wait on the workqueue.

v3: Drop work_busy, as there's a sizeable delay when the worker
finishes, which introduces a race in which the client has already
received the last flip event but the next page flip ioctl will still
return -EBUSY because work_busy returns outdated information.

v4: Hold dev->event_lock while checking the VOP's event field as
suggested by Daniel Stone.

v5: Only block if there's outstanding work if it's a blocking call.

Signed-off-by: Tomeu Vizoso 
---
 drivers/gpu/drm/rockchip/rockchip_drm_drv.h |  1 +
 drivers/gpu/drm/rockchip/rockchip_drm_fb.c  | 25 ++---
 drivers/gpu/drm/rockchip/rockchip_drm_vop.c |  6 ++
 3 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_drv.h 
b/drivers/gpu/drm/rockchip/rockchip_drm_drv.h
index 56f43a364c7f..0b46617decd9 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_drv.h
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_drv.h
@@ -76,6 +76,7 @@ void rockchip_drm_atomic_work(struct work_struct *work);
 int rockchip_register_crtc_funcs(struct drm_crtc *crtc,
 const struct rockchip_crtc_funcs *crtc_funcs);
 void rockchip_unregister_crtc_funcs(struct drm_crtc *crtc);
+bool rockchip_drm_crtc_has_pending_event(struct drm_crtc *crtc);
 int rockchip_drm_dma_attach_device(struct drm_device *drm_dev,
   struct device *dev);
 void rockchip_drm_dma_detach_device(struct drm_device *drm_dev,
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_fb.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
index 755cfdba61cd..e9531353b8d2 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_fb.c
@@ -279,15 +279,34 @@ int rockchip_drm_atomic_commit(struct drm_device *dev,
 {
struct rockchip_drm_private *private = dev->dev_private;
struct rockchip_atomic_commit *commit = &private->commit;
-   int ret;
+   struct drm_crtc_state *crtc_state;
+   struct drm_crtc *crtc;
+   unsigned long flags;
+   int i, ret;
+
+   if (nonblock) {
+   for_each_crtc_in_state(state, crtc, crtc_state, i) {
+   spin_lock_irqsave(&dev->event_lock, flags);
+
+   if (crtc->state->event ||
+   rockchip_drm_crtc_has_pending_event(crtc)) {
+   spin_unlock_irqrestore(&dev->event_lock, flags);
+   return -EBUSY;
+   }
+
+   spin_unlock_irqrestore(&dev->event_lock, flags);
+   }
+   }
 
ret = drm_atomic_helper_prepare_planes(dev, state);
if (ret)
return ret;
 
-   /* serialize outstanding nonblocking commits */
mutex_lock(&commit->lock);
-   flush_work(&commit->work);
+
+   /* serialize outstanding nonblocking commits */
+   if (!nonblock)
+   flush_work(&commit->work);
 
drm_atomic_helper_swap_state(dev, state);
 
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
index 1c4d5b5a70a2..3f980f52c640 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
@@ -836,6 +836,12 @@ static const struct drm_plane_funcs vop_plane_funcs = {
.atomic_destroy_state = vop_atomic_plane_destroy_state,
 };
 
+bool rockchip_drm_crtc_has_pending_event(struct drm_crtc *crtc)
+{
+   assert_spin_locked(&crtc->dev->event_lock);
+   return to_vop(crtc)->event;
+}
+
 static int vop_crtc_enable_vblank(struct drm_crtc *crtc)
 {
struct vop *vop = to_vop(crtc);
-- 
2.5.5

Re: [PATCH] drm/rockchip: Return -EBUSY if there's already a pending flip event v5

2016-05-24 Thread Daniel Stone

On 24 May 2016 at 09:27, Tomeu Vizoso  wrote:
> As per the docs, atomic_commit should return -EBUSY "if an asycnhronous
> updated is requested and there is an earlier updated pending".
>
> v2: Use the status of the workqueue instead of vop->event, and don't add
> a superfluous wait on the workqueue.
>
> v3: Drop work_busy, as there's a sizeable delay when the worker
> finishes, which introduces a race in which the client has already
> received the last flip event but the next page flip ioctl will still
> return -EBUSY because work_busy returns outdated information.
>
> v4: Hold dev->event_lock while checking the VOP's event field as
> suggested by Daniel Stone.
>
> v5: Only block if there's outstanding work if it's a blocking call.
>
> Signed-off-by: Tomeu Vizoso 

Reviewed-by: Daniel Stone 

Cheers,
Daniel

Re: [PATCH 00/11] Clock improvement for video playback

2016-05-24 Thread Arnaud Pouliquen

Hello Gabriel,

Tested with success on stih407 family platforms

Reviewed-by: Arnaud Pouliquen 
Tested-by: Arnaud Pouliquen 

Regards
Arnaud

On 05/18/2016 10:41 AM, Gabriel Fernandez wrote:
> This serie allows to increase video resolutions and make audio
> adjustment during a video playback.
> 
> Gabriel Fernandez (11):
>   drivers: clk: st: Add fs660c32 synthesizer algorithm
>   drivers: clk: st: Add clock propagation for audio clocks
>   drivers: clk: st: Handle clkgenD2 clk synchronous mode
>   ARM: DT: STiH407: Add compatibility string on clkgend0 for audio
> clocks
>   ARM: DT: STiH410: Add compatibility string on clkgend0 for audio
> clocks
>   ARM: DT: STiH418: Add compatibility string on clkgend0 for audio
> clocks
>   ARM: DT: STiH407: Enable synchronous clock mode on clkgend2
>   ARM: DT: STiH410: Enable synchronous clock mode on clkgend2
>   ARM: DT: STiH418: Enable synchronous clock mode on clkgend2
>   ARM: DT: STi: STiH407: clock configuration to address 720p and 1080p
>   ARM: DT: STi: STiH410: clock configuration to address 720p and 1080p
> 
>  .../devicetree/bindings/clock/st/st,flexgen.txt|   3 +
>  arch/arm/boot/dts/stih407-clock.dtsi   |   4 +-
>  arch/arm/boot/dts/stih407.dtsi |  16 ++-
>  arch/arm/boot/dts/stih410-clock.dtsi   |   4 +-
>  arch/arm/boot/dts/stih410.dtsi |  16 ++-
>  arch/arm/boot/dts/stih418-clock.dtsi   |   4 +-
>  drivers/clk/st/clk-flexgen.c   |  61 +++-
>  drivers/clk/st/clkgen-fsyn.c   | 159 
> +++--
>  8 files changed, 211 insertions(+), 56 deletions(-)
>

RE: [PATCH RFC kernel] balloon: speed up inflating/deflating process

2016-05-24 Thread Li, Liang Z

> On Fri, 20 May 2016 17:59:46 +0800
> Liang Li  wrote:
> 
> > The implementation of the current virtio-balloon is not very
> > efficient, Bellow is test result of time spends on inflating the
> > balloon to 3GB of a 4GB idle guest:
> >
> > a. allocating pages (6.5%, 103ms)
> > b. sending PFNs to host (68.3%, 787ms) c. address translation (6.1%,
> > 96ms) d. madvise (19%, 300ms)
> >
> > It takes about 1577ms for the whole inflating process to complete. The
> > test shows that the bottle neck is the stage b and stage d.
> >
> > If using a bitmap to send the page info instead of the PFNs, we can
> > reduce the overhead spends on stage b quite a lot. Furthermore, it's
> > possible to do the address translation and do the madvise with a bulk
> > of pages, instead of the current page per page way, so the overhead of
> > stage c and stage d can also be reduced a lot.
> >
> > This patch is the kernel side implementation which is intended to
> > speed up the inflating & deflating process by adding a new feature to
> > the virtio-balloon device. And now, inflating the balloon to 3GB of a
> > 4GB idle guest only takes 175ms, it's about 9 times as fast as before.
> >
> > TODO: optimize stage a by allocating/freeing a chunk of pages instead
> > of a single page at a time.
> 
> Not commenting on the approach, but...
> 
> >
> > Signed-off-by: Liang Li 
> > ---
> >  drivers/virtio/virtio_balloon.c | 199
> ++--
> >  include/uapi/linux/virtio_balloon.h |   1 +
> >  mm/page_alloc.c |   6 ++
> >  3 files changed, 198 insertions(+), 8 deletions(-)
> >
> 
> >  static void tell_host(struct virtio_balloon *vb, struct virtqueue
> > *vq)  {
> > -   struct scatterlist sg;
> > unsigned int len;
> >
> > -   sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> > +   if (virtio_has_feature(vb->vdev,
> VIRTIO_BALLOON_F_PAGE_BITMAP)) {
> > +   u32 page_shift = PAGE_SHIFT;
> > +   unsigned long start_pfn, end_pfn, flags = 0, bmap_len;
> > +   struct scatterlist sg[5];
> > +
> > +   start_pfn = rounddown(vb->start_pfn, BITS_PER_LONG);
> > +   end_pfn = roundup(vb->end_pfn, BITS_PER_LONG);
> > +   bmap_len = (end_pfn - start_pfn) / BITS_PER_LONG *
> sizeof(long);
> > +
> > +   sg_init_table(sg, 5);
> > +   sg_set_buf(&sg[0], &flags, sizeof(flags));
> > +   sg_set_buf(&sg[1], &start_pfn, sizeof(start_pfn));
> > +   sg_set_buf(&sg[2], &page_shift, sizeof(page_shift));
> > +   sg_set_buf(&sg[3], &bmap_len, sizeof(bmap_len));
> > +   sg_set_buf(&sg[4], vb->page_bitmap +
> > +(start_pfn / BITS_PER_LONG), bmap_len);
> > +   virtqueue_add_outbuf(vq, sg, 5, vb, GFP_KERNEL);
> > +
> 
> ...you need to take care of the endianness of the data you put on the queue,
> otherwise virtio-1 on big endian won't work. (There's just been a patch for
> that problem.)

OK, thanks for your reminding.

Liang

RE: [PATCH RFC kernel] balloon: speed up inflating/deflating process

2016-05-24 Thread Li, Liang Z

> On 20/05/2016 11:59, Liang Li wrote:
> > +
> > +   sg_init_table(sg, 5);
> > +   sg_set_buf(&sg[0], &flags, sizeof(flags));
> > +   sg_set_buf(&sg[1], &start_pfn, sizeof(start_pfn));
> > +   sg_set_buf(&sg[2], &page_shift, sizeof(page_shift));
> > +   sg_set_buf(&sg[3], &bmap_len, sizeof(bmap_len));
> 
> These four should probably be placed in a single struct and therefore a single
> sg entry.  It might even be faster to place it together with the bitmap, thus
> avoiding the use of indirect descriptors.
> 

Yes, thanks for your suggestion.

> You should also test ballooning of a 64GB guest after filling in the page 
> cache,
> not just ballooning of a freshly booted 4GB guest.  This will give you a much
> more sparse bitmap.  Still, the improvement in sending PFNs to the host are

I will include the test result for that case in next version.


Thanks,

Liang

> impressive.
> 
> Thanks,
> 
> Paolo
> 
> > +   sg_set_buf(&sg[4], vb->page_bitmap +
> > +(start_pfn / BITS_PER_LONG), bmap_len);
> > +   virtqueue_add_outbuf(vq, sg, 5, vb, GFP_KERNEL);

Re: [PATCH 8/8] af_unix: charge buffers to kmemcg

2016-05-24 Thread Vladimir Davydov

[adding netdev to Cc]

On Mon, May 23, 2016 at 01:20:29PM +0300, Vladimir Davydov wrote:
> Unix sockets can consume a significant amount of system memory, hence
> they should be accounted to kmemcg.
> 
> Since unix socket buffers are always allocated from process context,
> all we need to do to charge them to kmemcg is set __GFP_ACCOUNT in
> sock->sk_allocation mask.
> 
> Signed-off-by: Vladimir Davydov 
> Cc: "David S. Miller" 
> ---
>  net/unix/af_unix.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 80aa6a3e6817..022bdd3ab7d9 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -769,6 +769,7 @@ static struct sock *unix_create1(struct net *net, struct 
> socket *sock, int kern)
>   lockdep_set_class(&sk->sk_receive_queue.lock,
>   &af_unix_sk_receive_queue_lock_key);
>  
> + sk->sk_allocation   = GFP_KERNEL_ACCOUNT;
>   sk->sk_write_space  = unix_write_space;
>   sk->sk_max_ack_backlog  = net->unx.sysctl_max_dgram_qlen;
>   sk->sk_destruct = unix_sock_destructor;

[PATCH v4] KVM: halt-polling: poll for the upcoming fire timers

2016-05-24 Thread Wanpeng Li

From: Wanpeng Li 

If an emulated lapic timer will fire soon(in the scope of 10us the
base of dynamic halt-polling, lower-end of message passing workload
latency TCP_RR's poll time < 10us) we can treat it as a short halt,
and poll to wait it fire, the fire callback apic_timer_fn() will set
KVM_REQ_PENDING_TIMER, and this flag will be check during busy poll.
This can avoid context switch overhead and the latency which we wake
up vCPU.

This feature is slightly different from current advance expiration 
way. Advance expiration rely on the vCPU is running(do polling before 
vmentry). But in some cases, the timer interrupt may be blocked by 
other thread(i.e., IF bit is clear) and vCPU cannot be scheduled to 
run immediately. So even advance the timer early, vCPU may still see 
the latency. But polling is different, it ensures the vCPU to aware 
the timer expiration before schedule out.

echo HRTICK > /sys/kernel/debug/sched_features in dynticks guests.

Context switching - times in microseconds - smaller is better
-
Host OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
 ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
- - -- -- -- -- -- --- ---
kernel Linux 4.6.0+ 7.9800   11.0   10.8   14.6 9.430013.010.2 
vanilla
kernel Linux 4.6.0+   15.3   13.6   10.7   12.5 9.12.8 7.38000 poll

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: David Matlack 
Cc: Christian Borntraeger 
Cc: Yang Zhang 
Signed-off-by: Wanpeng Li 
---
v3 -> v4:
 * add module parameter halt_poll_ns_timer
 * rename patch subject since lapic maybe just for x86.
v2 -> v3:
 * add Yang's statement to patch description
v1 -> v2:
 * add return statement to non-x86 archs
 * capture never expire case for x86 (hrtimer is not started)

 arch/arm/include/asm/kvm_host.h |  4 
 arch/arm64/include/asm/kvm_host.h   |  4 
 arch/mips/include/asm/kvm_host.h|  4 
 arch/powerpc/include/asm/kvm_host.h |  4 
 arch/s390/include/asm/kvm_host.h|  4 
 arch/x86/kvm/lapic.c| 11 +++
 arch/x86/kvm/lapic.h|  1 +
 arch/x86/kvm/x86.c  |  5 +
 include/linux/kvm_host.h|  1 +
 virt/kvm/kvm_main.c | 15 +++
 10 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 0df6b1f..fdfbed9 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -292,6 +292,10 @@ static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
+static inline u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu)
+{
+   return -1ULL;
+}
 
 static inline void kvm_arm_init_debug(void) {}
 static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index e63d23b..f510d71 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -371,6 +371,10 @@ static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
+static inline u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu)
+{
+   return -1ULL;
+}
 
 void kvm_arm_init_debug(void);
 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 6733ac5..baf9472 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -814,6 +814,10 @@ static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu 
*vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
+static inline u64 kvm_arch_timer_remaining(struct kvm_vcpu *vcpu)
+{
+   return -1ULL;
+}
 static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
 
 #endif /* __MIPS_KVM_HOST_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index ec35af3..5986c79 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -729,5 +729,9 @@ static inline void kvm_arch_exit(void) {}
 static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
+static inline u64 kvm_arch_timer_rem

Re: [PATCH] PCI: pcie: Call pm_runtime_no_callbacks() after device is registered

2016-05-24 Thread Mika Westerberg

On Mon, May 23, 2016 at 02:55:00PM -0500, Bjorn Helgaas wrote:
> On Mon, May 23, 2016 at 11:11:55AM +0300, Mika Westerberg wrote:
> > Commit 0195d2813547 ("PCI: Add runtime PM support for PCIe ports") added
> > call to pm_runtime_no_callbacks() for each port service device to prevent
> > them exposing unnecessary runtime PM sysfs files. However, that function
> > tries to acquire dev->power.lock which is not yet initialized.
> > 
> > This triggers following splat:
> > 
> >  BUG: spinlock bad magic on CPU#0, swapper/0/1
> >   lock: 0x8801be2aa8e8, .magic: , .owner: /-1, 
> > .owner_cpu: 0
> >  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0+ #820
> >    8801beb97be0 812cf42d 
> >   8801be2aa8e8 8801beb97c00 8109ee58 8801be2aa8e8
> >   8801be2aa8e8 8801beb97c30 8109efd9 8801be2aa8e8
> >  Call Trace:
> >   [] dump_stack+0x4f/0x72
> >   [] spin_dump+0x78/0xc0
> >   [] do_raw_spin_lock+0xf9/0x150
> >   [] _raw_spin_lock_irq+0x20/0x30
> >   [] pm_runtime_no_callbacks+0x1e/0x40
> >   [] pcie_port_device_register+0x1fd/0x4e0
> >   [] pcie_portdrv_probe+0x38/0xa0
> >   [] local_pci_probe+0x45/0xa0
> >   [] ? pci_match_device+0xe0/0x110
> >   [] pci_device_probe+0xdb/0x130
> >   [] driver_probe_device+0x22c/0x440
> >   [] __driver_attach+0xd1/0xf0
> >   [] ? driver_probe_device+0x440/0x440
> >   [] bus_for_each_dev+0x64/0xa0
> >   [] driver_attach+0x1e/0x20
> >   [] bus_add_driver+0x1eb/0x280
> >   [] ? pcie_port_setup+0x7c/0x7c
> >   [] driver_register+0x60/0xe0
> >   [] __pci_register_driver+0x60/0x70
> >   [] pcie_portdrv_init+0x63/0x75
> >   [] do_one_initcall+0xab/0x1c0
> >   [] kernel_init_freeable+0x153/0x1d9
> >   [] kernel_init+0xe/0x100
> >   [] ret_from_fork+0x22/0x40
> >   [] ? rest_init+0x90/0x90
> > 
> > Fix this by calling pm_runtime_no_callbacks() after device_register() just
> > like other buses, like I2C is doing already.
> > 
> > Reported-by: Valdis Kletnieks 
> > Tested-by: Valdis Kletnieks 
> > Suggested-by: Lukas Wunner 
> > Signed-off-by: Mika Westerberg 
> 
> I think this is a bugfix for "PCI: Add runtime PM support for PCIe
> ports", so I folded this into that patch since it hasn't been merged
> yet.  Is that the right place for it?

Yes, that's right. Thanks!

Re: [PATCH] hwrng: stm32 - fix build warning

2016-05-24 Thread Maxime Coquelin

2016-05-23 22:35 GMT+02:00 Arnd Bergmann :
> On Monday, May 23, 2016 6:14:08 PM CEST Sudip Mukherjee wrote:
>> We have been getting build warning about:
>> drivers/char/hw_random/stm32-rng.c: In function 'stm32_rng_read':
>> drivers/char/hw_random/stm32-rng.c:82:19: warning: 'sr' may be used
>>   uninitialized in this function
>>
>> On checking the code it turns out that sr can never be used
>> uninitialized as sr is getting initialized in the while loop and while
>> loop will always execute as the minimum value of max can be 32.
>> So just initialize sr to 0 while declaring it to silence the compiler.
>>
>> Signed-off-by: Sudip Mukherjee 
>> ---
>
> I notice that you are using a really old compiler. While this warning
> seems to be valid in the sense that the compiler should figure out that
> the variable might be used uninitialized, please update your toolchain
> before reporting other such problems, as gcc-4.6 had a lot more false
> positives that newer ones (5.x or 6.x) have.
>
>>
>> build log at:
>> https://travis-ci.org/sudipm-mukherjee/parport/jobs/132180906
>>
>>  drivers/char/hw_random/stm32-rng.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/char/hw_random/stm32-rng.c 
>> b/drivers/char/hw_random/stm32-rng.c
>> index 92a8106..0533370 100644
>> --- a/drivers/char/hw_random/stm32-rng.c
>> +++ b/drivers/char/hw_random/stm32-rng.c
>> @@ -52,7 +52,7 @@ static int stm32_rng_read(struct hwrng *rng, void *data, 
>> size_t max, bool wait)
>>  {
>>   struct stm32_rng_private *priv =
>>   container_of(rng, struct stm32_rng_private, rng);
>> - u32 sr;
>> + u32 sr = 0;
>>   int retval = 0;
>>
>>   pm_runtime_get_sync((struct device *) priv->rng.priv);
>
> Does this work as well?
>
> diff --git a/drivers/char/hw_random/stm32-rng.c 
> b/drivers/char/hw_random/stm32-rng.c
> index 92a810648bd0..5c836b0afa40 100644
> --- a/drivers/char/hw_random/stm32-rng.c
> +++ b/drivers/char/hw_random/stm32-rng.c
> @@ -79,7 +79,7 @@ static int stm32_rng_read(struct hwrng *rng, void *data, 
> size_t max, bool wait)
> max -= sizeof(u32);
> }
>
> -   if (WARN_ONCE(sr & (RNG_SR_SEIS | RNG_SR_CEIS),
> +   if (WARN_ONCE(retval > 0 && (sr & (RNG_SR_SEIS | RNG_SR_CEIS)),
>   "bad RNG status - %x\n", sr))
> writel_relaxed(0, priv->base + RNG_SR);
>
> I think it would be nicer to not add a bogus initialization.
Hmm, no sure this nicer.
The while loop can break before retval is incremented when sr value is
not expected (sr != RNG_SR_DRDY).
In that case, we certainly want to print sr value.

Maybe the better way is just to initialize sr with status register content?

diff --git a/drivers/char/hw_random/stm32-rng.c
b/drivers/char/hw_random/stm32-rng.c
index 92a810648bd0..07a6659d0fe6 100644
--- a/drivers/char/hw_random/stm32-rng.c
+++ b/drivers/char/hw_random/stm32-rng.c
@@ -57,8 +57,8 @@ static int stm32_rng_read(struct hwrng *rng, void
*data, size_t max, bool wait)

pm_runtime_get_sync((struct device *) priv->rng.priv);

+   sr = readl_relaxed(priv->base + RNG_SR);
while (max > sizeof(u32)) {
-   sr = readl_relaxed(priv->base + RNG_SR);
if (!sr && wait) {
unsigned int timeout = RNG_TIMEOUT;

@@ -77,6 +77,8 @@ static int stm32_rng_read(struct hwrng *rng, void
*data, size_t max, bool wait)
retval += sizeof(u32);
data += sizeof(u32);
max -= sizeof(u32);
+
+   sr = readl_relaxed(priv->base + RNG_SR);
}

if (WARN_ONCE(sr & (RNG_SR_SEIS | RNG_SR_CEIS),


Regards,
Maxime

Re: [PATCH 09/16] sched/fair: Let asymmetric cpu configurations balance at wake-up

2016-05-24 Thread Yuyang Du

On Mon, May 23, 2016 at 11:58:51AM +0100, Morten Rasmussen wrote:
> Currently, SD_WAKE_AFFINE always takes priority over wakeup balancing if
> SD_BALANCE_WAKE is set on the sched_domains. For asymmetric
> configurations SD_WAKE_AFFINE is only desirable if the waking task's
> compute demand (utilization) is suitable for the cpu capacities
> available within the SD_WAKE_AFFINE sched_domain. If not, let wakeup
> balancing take over (find_idlest_{group, cpu}()).
> 
> The assumption is that SD_WAKE_AFFINE is never set for a sched_domain
> containing cpus with different capacities. This is enforced by a
> previous patch based on the SD_ASYM_CPUCAPACITY flag.
> 
> Ideally, we shouldn't set 'want_affine' in the first place, but we don't
> know if SD_BALANCE_WAKE is enabled on the sched_domain(s) until we start
> traversing them.
> 
> cc: Ingo Molnar 
> cc: Peter Zijlstra 
> 
> Signed-off-by: Morten Rasmussen 
> ---
>  kernel/sched/fair.c | 28 +++-
>  1 file changed, 27 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 564215d..ce44fa7 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -114,6 +114,12 @@ unsigned int __read_mostly sysctl_sched_shares_window = 
> 1000UL;
>  unsigned int sysctl_sched_cfs_bandwidth_slice = 5000UL;
>  #endif
>  
> +/*
> + * The margin used when comparing utilization with cpu capacity:
> + * util * 1024 < capacity * margin
> + */
> +unsigned int capacity_margin = 1280; /* ~20% */
> +
>  static inline void update_load_add(struct load_weight *lw, unsigned long inc)
>  {
>   lw->weight += inc;
> @@ -5293,6 +5299,25 @@ static int cpu_util(int cpu)
>   return (util >= capacity) ? capacity : util;
>  }
>  
> +static inline int task_util(struct task_struct *p)
> +{
> + return p->se.avg.util_avg;
> +}
> +
> +static int wake_cap(struct task_struct *p, int cpu, int prev_cpu)
> +{
> + long delta;
> + long prev_cap = capacity_of(prev_cpu);
> +
> + delta = cpu_rq(cpu)->rd->max_cpu_capacity - prev_cap;
> +
> + /* prev_cpu is fairly close to max, no need to abort wake_affine */
> + if (delta < prev_cap >> 3)
> + return 0;

delta can be negative? still return 0?

> +
> + return prev_cap * 1024 < task_util(p) * capacity_margin;
> +}

Re: [PATCH 04/16] sched/fair: Optimize find_idlest_cpu() when there is no choice

2016-05-24 Thread Morten Rasmussen

On Tue, May 24, 2016 at 08:29:05AM +0200, Mike Galbraith wrote:
> On Mon, 2016-05-23 at 11:58 +0100, Morten Rasmussen wrote:
> > In the current find_idlest_group()/find_idlest_cpu() search we end up
> > calling find_idlest_cpu() in a sched_group containing only one cpu in
> > the end. Checking idle-states becomes pointless when there is no
> > alternative, so bail out instead.
> > 
> > cc: Ingo Molnar 
> > cc: Peter Zijlstra 
> > 
> > Signed-off-by: Morten Rasmussen 
> > ---
> >  kernel/sched/fair.c | 5 +
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 0fe3020..564215d 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5155,6 +5155,11 @@ find_idlest_cpu(struct sched_group *group, struct 
> > task_struct *p, int this_cpu)
> >  >  > int shallowest_idle_cpu = -1;
> >  >  > int i;
> >  
> > +>  > /* Check if we have any choice */
> > +>  > if (group->group_weight == 1) {
> > +>  >   > return cpumask_first(sched_group_cpus(group));
> > +>  > }
> > +
> 
> Hm, if task isn't allowed there, too bad?

Is that possible for single-cpu groups? I thought we skipped groups with
no cpus allowed in find_idlest_group():

/* Skip over this group if it has no CPUs allowed */
if (!cpumask_intersects(sched_group_cpus(group),
tsk_cpus_allowed(p)))
continue;

Since the group has at least one cpu allowed and only contains one cpu,
that cpu must be allowed. No?

Re: [PATCH v6 11/12] zsmalloc: page migration support

2016-05-24 Thread Sergey Senozhatsky

Hello,

On (05/24/16 15:28), Minchan Kim wrote:
[..]
> Most important point to me is that it makes code *simple* at the cost of
> addtional wasting memory. Now, every zspage lives in *a* list so we don't
> need to check zspage groupness to use list_empty of zspage.
> I'm not sure how you feel it makes code simple a lot.
> However, while I implement page migration logic, the check with condition
> that zspage's groupness is either almost_empty and almost_full is really
> bogus and tricky to me so I should debug several time to find what's
> wrong.
> 
> Compared to old, zsmalloc is complicated day by day so I want to weight
> on *simple* for easy maintainance.
> 
> One more note:
> Now, ZS_EMPTY is used as pool. Look at find_get_zspage. So adding
> "empty" column in ZSMALLOC_STAT might be worth but I wanted to handle it
> as another topic.
> 
> So if you don't feel strong the saving is really huge, I want to
> go with this. And if we are adding more wasted memory in future,
> let's handle it then.

oh, sure, all those micro-optimizations can be done later,
off the series.

> About CONFIG_ZSMALLOC_STAT, It might be off-topic. Frankly speaking,
> I have guided production team to enable it because when I profile the
> overhead caused by ZSMALLOC_STAT, there is no performance lost
> in real workload. However, the stat gives more detailed useful
> information.

ok, agree.
good to know that you use stats in production, by the way.

[..]
> > > + pos = (((class->objs_per_zspage * class->size) *
> > > + page_idx / class->pages_per_zspage) / class->size
> > > +   ) * class->size;
> > 
> > 
> > something went wrong with the indentation here :)
> > 
> > so... it's
> > 
> > (((class->objs_per_zspage * class->size) * page_idx / 
> > class->pages_per_zspage) / class->size ) * class->size;
> > 
> > the last ' / class->size ) * class->size' can be dropped, I think.
> 
> You prove I didn't learn math.
> Will drop it.

haha, no, that wasn't the point :) great job with the series!

[..]
> > hm... zsmalloc is getting sooo complex now.
> > 
> > `system_wq' -- can we have problems here when the system is getting
> > low on memory and workers are getting increasingly busy trying to
> > allocate the memory for some other purposes?
> > 
> > _theoretically_ zsmalloc can stack a number of ready-to-release zspages,
> > which won't be accessible to zsmalloc, nor will they be released. how likely
> > is this? hm, can zsmalloc take zspages from that deferred release list when
> > it wants to allocate a new zspage?
> 
> Done.

oh, good. that was a purely theoretical thing, and to continue with the
theories, I assume that zs_malloc() will improve with this change. the
sort of kind of problem with zs_malloc(), *I think*, is that we release
the class ->lock after failed find_get_zspage():

handle = cache_alloc_handle(pool, gfp);
if (!handle)
return 0;

zspage = find_get_zspage(class);
if (likely(zspage)) {
obj = obj_malloc(class, zspage, handle);
[..]
spin_unlock(&class->lock);

return handle;
}

spin_unlock(&class->lock);

zspage = alloc_zspage(pool, class, gfp);
if (!zspage) {
cache_free_handle(pool, handle);
return 0;
}

spin_lock(&class->lock);
obj = obj_malloc(class, zspage, handle);
[..]
spin_unlock(&class->lock);

_theoretically_, on a not-really-huge system, let's say 64 CPUs for
example, we can have 64 write paths trying to store objects of size
OBJ_SZ to a ZS_FULL class-OBJSZ. the write path (each of them) will
fail on find_get_zspage(), unlock the class ->lock (so another write
path will have its chance to fail on find_get_zspage()), alloc_zspage(),
create a page chain, spin on class ->lock to add the new zspage to the
class. so we can end up allocating up to 64 zspages, each of them will
carry N PAGE_SIZE pages. those zspages, at least at the beginning, will
store only one object per-zspage; which will blastoff the internal
fragmentation and can cause more compaction/migration/etc later on. well,
it's a bit pessimistic, but I think to _some extent_ this scenario is
quite possible.

I assume that this "pick an already marked for release zspage" thing is
happening as a fast path within the first class ->lock section, so the
rest of concurrent write requests that are spinning on the class ->lock
at the moment will see a zspage, instead of !find_get_zspage().

-ss

[PATCH] usb: core: add debugobjects support for urb object

2016-05-24 Thread changbin . du

From: "Du, Changbin" 

Add debugobject support to track the life time of struct urb.
This feature help us detect violation of urb operations by
generating a warning message from debugobject core. And we fix
the possible issues at runtime to avoid oops if we can.

I have done some tests with some class drivers, no violation
found in them which is good. Expect this feature can be used
for debugging future problems.

Signed-off-by: Du, Changbin 
---
 drivers/usb/core/hcd.c |   1 +
 drivers/usb/core/urb.c | 117 +++--
 include/linux/usb.h|   8 
 lib/Kconfig.debug  |   8 
 4 files changed, 130 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index 34b837a..a8ea128 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -1750,6 +1750,7 @@ static void __usb_hcd_giveback_urb(struct urb *urb)
/* pass ownership to the completion handler */
urb->status = status;
 
+   debug_urb_deactivate(urb);
/*
 * We disable local IRQs here avoid possible deadlock because
 * drivers may call spin_lock() to hold lock which might be
diff --git a/drivers/usb/core/urb.c b/drivers/usb/core/urb.c
index c601e25..0d1eccb 100644
--- a/drivers/usb/core/urb.c
+++ b/drivers/usb/core/urb.c
@@ -10,6 +10,100 @@
 
 #define to_urb(d) container_of(d, struct urb, kref)
 
+#ifdef CONFIG_DEBUG_OBJECTS_URB
+static struct debug_obj_descr urb_debug_descr;
+
+static void *urb_debug_hint(void *addr)
+{
+   return ((struct urb *) addr)->complete;
+}
+
+/*
+ * fixup_init is called when:
+ * - an active object is initialized
+ */
+static bool urb_fixup_init(void *addr, enum debug_obj_state state)
+{
+   struct urb *urb = addr;
+
+   switch (state) {
+   case ODEBUG_STATE_ACTIVE:
+   usb_kill_urb(urb);
+   debug_object_init(urb, &urb_debug_descr);
+   return true;
+   default:
+   return false;
+   }
+}
+
+/*
+ * fixup_activate is called when:
+ * - an active object is activated
+ * - an unknown non-static object is activated
+ */
+static bool urb_fixup_activate(void *addr, enum debug_obj_state state)
+{
+   struct urb *urb = urb;
+
+   switch (state) {
+   case ODEBUG_STATE_ACTIVE:
+   usb_kill_urb(urb);
+   debug_object_activate(urb, &urb_debug_descr);
+   return true;
+   default:
+   return false;
+   }
+}
+
+/*
+ * fixup_free is called when:
+ * - an active object is freed
+ */
+static bool urb_fixup_free(void *addr, enum debug_obj_state state)
+{
+   struct urb *urb = addr;
+
+   switch (state) {
+   case ODEBUG_STATE_ACTIVE:
+   usb_kill_urb(urb);
+   debug_object_free(urb, &urb_debug_descr);
+   return true;
+   default:
+   return false;
+   }
+}
+
+static struct debug_obj_descr urb_debug_descr = {
+   .name   = "urb",
+   .debug_hint = urb_debug_hint,
+   .fixup_init = urb_fixup_init,
+   .fixup_activate = urb_fixup_activate,
+   .fixup_free = urb_fixup_free,
+};
+
+static void debug_urb_init(struct urb *urb)
+{
+   /**
+* The struct urb structure must never be
+* created statically, so no init object
+* on stack case.
+*/
+   debug_object_init(urb, &urb_debug_descr);
+}
+
+int debug_urb_activate(struct urb *urb)
+{
+   return debug_object_activate(urb, &urb_debug_descr);
+}
+
+void debug_urb_deactivate(struct urb *urb)
+{
+   debug_object_deactivate(urb, &urb_debug_descr);
+}
+
+#else
+static inline void debug_urb_init(struct urb *urb) { }
+#endif
 
 static void urb_destroy(struct kref *kref)
 {
@@ -41,6 +135,7 @@ void usb_init_urb(struct urb *urb)
memset(urb, 0, sizeof(*urb));
kref_init(&urb->kref);
INIT_LIST_HEAD(&urb->anchor_list);
+   debug_urb_init(urb);
}
 }
 EXPORT_SYMBOL_GPL(usb_init_urb);
@@ -331,6 +426,7 @@ int usb_submit_urb(struct urb *urb, gfp_t mem_flags)
struct usb_host_endpoint*ep;
int is_out;
unsigned intallowed;
+   int ret;
 
if (!urb || !urb->complete)
return -EINVAL;
@@ -539,10 +635,23 @@ int usb_submit_urb(struct urb *urb, gfp_t mem_flags)
}
}
 
-   return usb_hcd_submit_urb(urb, mem_flags);
+   ret = debug_urb_activate(urb);
+   if (ret)
+   return ret;
+   ret = usb_hcd_submit_urb(urb, mem_flags);
+   if (ret)
+   debug_urb_deactivate(urb);
+
+   return ret;
 }
 EXPORT_SYMBOL_GPL(usb_submit_urb);
 
+static inline int __usb_unlink_urb(struct urb *urb, int status)
+{
+   debug_urb_deactivate(urb);
+   return usb_hcd_unlink_urb(urb, status);
+}
+
 /*

Re: [PATCH v2 03/12] of: add J-Core interrupt controller bindings

2016-05-24 Thread Marc Zyngier

On 23/05/16 22:13, Rich Felker wrote:
> On Mon, May 23, 2016 at 03:53:20PM -0500, Rob Herring wrote:
>> On Fri, May 20, 2016 at 02:53:04AM +, Rich Felker wrote:
>>> Signed-off-by: Rich Felker 
>>> ---
>>>  .../bindings/interrupt-controller/jcore,aic.txt| 28 
>>> ++
>>>  1 file changed, 28 insertions(+)
>>>  create mode 100644 
>>> Documentation/devicetree/bindings/interrupt-controller/jcore,aic.txt
>>>
>>> diff --git 
>>> a/Documentation/devicetree/bindings/interrupt-controller/jcore,aic.txt 
>>> b/Documentation/devicetree/bindings/interrupt-controller/jcore,aic.txt
>>> new file mode 100644
>>> index 000..dc9fde8
>>> --- /dev/null
>>> +++ b/Documentation/devicetree/bindings/interrupt-controller/jcore,aic.txt
>>> @@ -0,0 +1,28 @@
>>> +J-Core Advanced Interrupt Controller
>>> +
>>> +Required properties:
>>> +
>>> +- compatible : Should be "jcore,aic1" for the (obsolete) first-generation 
>>> aic
>>> +  with 8 interrupt lines with programmable priorities, or "jcore,aic2" for
>>> +  the "aic2" core with 64 interrupts.
>>> +
>>> +- interrupt-controller : Identifies the node as an interrupt controller
>>> +
>>> +- #interrupt-cells : Specifies the number of cells needed to encode an
>>> +  interrupt source. The value shall be 1.
>>
>> No level/edge support? Need 2 cells if so.
> 
> No, all the logic is in hardware. From the software side you just need
> handle_simple_irq or equivalent.

Not even an EOI?

M.
-- 
Jazz is not dead. It just smells funny...

Re: [PATCH 09/16] sched/fair: Let asymmetric cpu configurations balance at wake-up

2016-05-24 Thread Morten Rasmussen

On Tue, May 24, 2016 at 08:04:24AM +0800, Yuyang Du wrote:
> On Mon, May 23, 2016 at 11:58:51AM +0100, Morten Rasmussen wrote:
> > Currently, SD_WAKE_AFFINE always takes priority over wakeup balancing if
> > SD_BALANCE_WAKE is set on the sched_domains. For asymmetric
> > configurations SD_WAKE_AFFINE is only desirable if the waking task's
> > compute demand (utilization) is suitable for the cpu capacities
> > available within the SD_WAKE_AFFINE sched_domain. If not, let wakeup
> > balancing take over (find_idlest_{group, cpu}()).
> > 
> > The assumption is that SD_WAKE_AFFINE is never set for a sched_domain
> > containing cpus with different capacities. This is enforced by a
> > previous patch based on the SD_ASYM_CPUCAPACITY flag.
> > 
> > Ideally, we shouldn't set 'want_affine' in the first place, but we don't
> > know if SD_BALANCE_WAKE is enabled on the sched_domain(s) until we start
> > traversing them.
> > 
> > cc: Ingo Molnar 
> > cc: Peter Zijlstra 
> > 
> > Signed-off-by: Morten Rasmussen 
> > ---
> >  kernel/sched/fair.c | 28 +++-
> >  1 file changed, 27 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 564215d..ce44fa7 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -114,6 +114,12 @@ unsigned int __read_mostly sysctl_sched_shares_window 
> > = 1000UL;
> >  unsigned int sysctl_sched_cfs_bandwidth_slice = 5000UL;
> >  #endif
> >  
> > +/*
> > + * The margin used when comparing utilization with cpu capacity:
> > + * util * 1024 < capacity * margin
> > + */
> > +unsigned int capacity_margin = 1280; /* ~20% */
> > +
> >  static inline void update_load_add(struct load_weight *lw, unsigned long 
> > inc)
> >  {
> > lw->weight += inc;
> > @@ -5293,6 +5299,25 @@ static int cpu_util(int cpu)
> > return (util >= capacity) ? capacity : util;
> >  }
> >  
> > +static inline int task_util(struct task_struct *p)
> > +{
> > +   return p->se.avg.util_avg;
> > +}
> > +
> > +static int wake_cap(struct task_struct *p, int cpu, int prev_cpu)
> > +{
> > +   long delta;
> > +   long prev_cap = capacity_of(prev_cpu);
> > +
> > +   delta = cpu_rq(cpu)->rd->max_cpu_capacity - prev_cap;
> > +
> > +   /* prev_cpu is fairly close to max, no need to abort wake_affine */
> > +   if (delta < prev_cap >> 3)
> > +   return 0;
> 
> delta can be negative? still return 0?

I could add an abs() around delta.

Do you have a specific scenario in mind? Under normal circumstances, I
don't think it can be negative?

[RESEND PATCH] perf tools: Add arch/*/include/generated/ to .gitignore

2016-05-24 Thread Taeung Song

Commit 1b700c9975008615ad470cf79acc8455ce60a695 ("perf tools: Build
syscall table .c header from kernel's syscall_64.tbl") that automatically
generate per-arch syscall table arrays e.g.

arch/x86/include/generated/asm/syscalls_64.c

So add this directory to .gitignore

Cc: Namhyung Kim 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Adrian Hunter 
Cc: David Ahern 
Cc: Wang Nan 
Cc: Alexander Shishkin 
Signed-off-by: Taeung Song 
---
 tools/perf/.gitignore | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/.gitignore b/tools/perf/.gitignore
index 3d1bb80..4bef135 100644
--- a/tools/perf/.gitignore
+++ b/tools/perf/.gitignore
@@ -30,3 +30,4 @@ config.mak.autogen
 *.pyo
 .config-detected
 util/intel-pt-decoder/inat-tables.c
+arch/*/include/generated/
\ No newline at end of file
-- 
2.5.0

Re: [PATCH 04/16] sched/fair: Optimize find_idlest_cpu() when there is no choice

2016-05-24 Thread Mike Galbraith

On Tue, 2016-05-24 at 09:05 +0100, Morten Rasmussen wrote:
> On Tue, May 24, 2016 at 08:29:05AM +0200, Mike Galbraith wrote:
> >  
> > > +> > > >  > > > > /* Check if we have any choice */
> > > +> > > >  > > > > if (group->group_weight == 1) {
> > > +> > > >  > > > > > > >   > > > > return 
> > > cpumask_first(sched_group_cpus(group));
> > > +> > > >  > > > > }
> > > +
> > 
> > Hm, if task isn't allowed there, too bad?
> 
> Is that possible for single-cpu groups? I thought we skipped groups with
> no cpus allowed in find_idlest_group():
> 
> /* Skip over this group if it has no CPUs allowed */
> if (!cpumask_intersects(sched_group_cpus(group),
> tsk_cpus_allowed(p)))
> continue;
> 
> Since the group has at least one cpu allowed and only contains one cpu,
> that cpu must be allowed. No?

Yup, you're right, handled before we got there.

-Mike

Re: v4.6 kernel BUG at mm/rmap.c:1101!

2016-05-24 Thread Mika Westerberg

On Mon, May 23, 2016 at 05:08:26PM +0200, Andrea Arcangeli wrote:
> On Mon, May 23, 2016 at 05:06:38PM +0300, Mika Westerberg wrote:
> > Hi,
> > 
> > After upgrading kernel of my desktop system from v4.6-rc7 to v4.6, I've
> > started seeing following:
> > 
> > [176611.093747] page:ea36 count:1 mapcount:0 
> > mapping:880034d2e0a1 index:0x1f9b06600 compound_mapcount: 0
> > [176611.093751] flags: 
> > 0x3fff844079(locked|uptodate|dirty|lru|active|head|swapbacked)
> > [176611.093752] page dumped because: VM_BUG_ON_PAGE(page->index != 
> > linear_page_index(vma, address))
> > [176611.093753] page->mem_cgroup:88049e81b800
> > [176611.093765] [ cut here ]
> 
> This is a splitted pmd tail that is triggering a COW, but it's still a
> compound page because the physical split didn't happen yet.
> 
> So like Kirill correctly pointed out, in such case we've to do
> compound_head because the page->mapping that has to be refiled to the
> local anon_vma is in the head.
> 
> It's just a false positive VM_BUG_ON, the code itself is correct.

OK, thanks for the explanation.

> Production kernels should be built with CONFIG_DEBUG_VM=n so this is
> not going to affect them and there's no bug for the production builds.

Hmm, the kernel shipped with Fedora 23 has that enabled:

lahna % grep CONFIG_DEBUG_VM /boot/config-4.4.9-300.fc23.x86_64 
CONFIG_DEBUG_VM=y
# CONFIG_DEBUG_VM_VMACACHE is not set
# CONFIG_DEBUG_VM_RB is not set

> Can you test this to shut off the false positive?

I'm testing with Kirill's patch (because he sent it first ;-)) and let
you know what happens. Thanks!

[PATCH V2] MAINTAINERS: Add Dialog PMIC search terms for missing documentation and header files

2016-05-24 Thread Steve Twiss

From: Steve Twiss 

Dialog Semiconductor support would like to follow files by adding to the
existing MAINTAINERS search terms. The update will allow us to follow
files for PMIC documentation bindings and header files.

The full list is:

DT bindings
 - Documentation/devicetree/bindings/mfd/da9052-i2c.txt
 - Documentation/devicetree/bindings/mfd/da9055.txt
 - Documentation/devicetree/bindings/mfd/da9062.txt
 - Documentation/devicetree/bindings/mfd/da9063.txt
 - Documentation/devicetree/bindings/mfd/da9150.txt
 - Documentation/devicetree/bindings/regulator/da9210.txt
 - Documentation/devicetree/bindings/regulator/da9211.txt
Header files
 - include/linux/mfd/da9062/core.h
 - include/linux/mfd/da9062/registers.h
 - include/linux/regulator/da9211.h

Signed-off-by: Steve Twiss 

---
Hi Lee & Mark,

The majority of these updates are for MFD documentation and headers,
although there is a mixture with the regulators as well.

I previously send a patch TO: Lee and CC:'ed Mark, but I think now I
need to resend V2 with Mark in the TO: field.
https://lkml.org/lkml/2016/5/11/419

I have also updated this V2 patch to align with linux-next/v4.6
and removed the change to the onkey driver.

Lee: with Mark's Ack, would it be possible for you to take these
through please?

Regards,
Steve


 MAINTAINERS | 4 
 1 file changed, 4 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 9c567a4..742a860 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3574,6 +3574,8 @@ M:Support Opensource 

 W: http://www.dialog-semiconductor.com/products
 S: Supported
 F: Documentation/hwmon/da90??
+F: Documentation/devicetree/bindings/mfd/da90*.txt
+F: Documentation/devicetree/bindings/regulator/da92*.txt
 F: Documentation/devicetree/bindings/sound/da[79]*.txt
 F: drivers/gpio/gpio-da90??.c
 F: drivers/hwmon/da90??-hwmon.c
@@ -3594,8 +3596,10 @@ F:   drivers/watchdog/da90??_wdt.c
 F: include/linux/mfd/da903x.h
 F: include/linux/mfd/da9052/
 F: include/linux/mfd/da9055/
+F: include/linux/mfd/da9062/
 F: include/linux/mfd/da9063/
 F: include/linux/mfd/da9150/
+F: include/linux/regulator/da9211.h
 F: include/sound/da[79]*.h
 F: sound/soc/codecs/da[79]*.[ch]
 
-- 
end-of-patch for PATCH V2

Re: [PATCH v6 11/12] zsmalloc: page migration support

2016-05-24 Thread Minchan Kim

On Tue, May 24, 2016 at 05:05:11PM +0900, Sergey Senozhatsky wrote:
> Hello,
> 
> On (05/24/16 15:28), Minchan Kim wrote:
> [..]
> > Most important point to me is that it makes code *simple* at the cost of
> > addtional wasting memory. Now, every zspage lives in *a* list so we don't
> > need to check zspage groupness to use list_empty of zspage.
> > I'm not sure how you feel it makes code simple a lot.
> > However, while I implement page migration logic, the check with condition
> > that zspage's groupness is either almost_empty and almost_full is really
> > bogus and tricky to me so I should debug several time to find what's
> > wrong.
> > 
> > Compared to old, zsmalloc is complicated day by day so I want to weight
> > on *simple* for easy maintainance.
> > 
> > One more note:
> > Now, ZS_EMPTY is used as pool. Look at find_get_zspage. So adding
> > "empty" column in ZSMALLOC_STAT might be worth but I wanted to handle it
> > as another topic.
> > 
> > So if you don't feel strong the saving is really huge, I want to
> > go with this. And if we are adding more wasted memory in future,
> > let's handle it then.
> 
> oh, sure, all those micro-optimizations can be done later,
> off the series.
> 
> > About CONFIG_ZSMALLOC_STAT, It might be off-topic. Frankly speaking,
> > I have guided production team to enable it because when I profile the
> > overhead caused by ZSMALLOC_STAT, there is no performance lost
> > in real workload. However, the stat gives more detailed useful
> > information.
> 
> ok, agree.
> good to know that you use stats in production, by the way.
> 
> [..]
> > > > +   pos = (((class->objs_per_zspage * class->size) *
> > > > +   page_idx / class->pages_per_zspage) / class->size
> > > > + ) * class->size;
> > > 
> > > 
> > > something went wrong with the indentation here :)
> > > 
> > > so... it's
> > > 
> > >   (((class->objs_per_zspage * class->size) * page_idx / 
> > > class->pages_per_zspage) / class->size ) * class->size;
> > > 
> > > the last ' / class->size ) * class->size' can be dropped, I think.
> > 
> > You prove I didn't learn math.
> > Will drop it.
> 
> haha, no, that wasn't the point :) great job with the series!
> 
> [..]
> > > hm... zsmalloc is getting sooo complex now.
> > > 
> > > `system_wq' -- can we have problems here when the system is getting
> > > low on memory and workers are getting increasingly busy trying to
> > > allocate the memory for some other purposes?
> > > 
> > > _theoretically_ zsmalloc can stack a number of ready-to-release zspages,
> > > which won't be accessible to zsmalloc, nor will they be released. how 
> > > likely
> > > is this? hm, can zsmalloc take zspages from that deferred release list 
> > > when
> > > it wants to allocate a new zspage?
> > 
> > Done.
> 
> oh, good. that was a purely theoretical thing, and to continue with the
> theories, I assume that zs_malloc() will improve with this change. the
> sort of kind of problem with zs_malloc(), *I think*, is that we release
> the class ->lock after failed find_get_zspage():
> 
>   handle = cache_alloc_handle(pool, gfp);
>   if (!handle)
>   return 0;
> 
>   zspage = find_get_zspage(class);
>   if (likely(zspage)) {
>   obj = obj_malloc(class, zspage, handle);
>   [..]
>   spin_unlock(&class->lock);
> 
>   return handle;
>   }
> 
>   spin_unlock(&class->lock);
> 
>   zspage = alloc_zspage(pool, class, gfp);
>   if (!zspage) {
>   cache_free_handle(pool, handle);
>   return 0;
>   }
> 
>   spin_lock(&class->lock);
>   obj = obj_malloc(class, zspage, handle);
>   [..]
>   spin_unlock(&class->lock);
> 
> 
> _theoretically_, on a not-really-huge system, let's say 64 CPUs for
> example, we can have 64 write paths trying to store objects of size
> OBJ_SZ to a ZS_FULL class-OBJSZ. the write path (each of them) will
> fail on find_get_zspage(), unlock the class ->lock (so another write
> path will have its chance to fail on find_get_zspage()), alloc_zspage(),
> create a page chain, spin on class ->lock to add the new zspage to the
> class. so we can end up allocating up to 64 zspages, each of them will
> carry N PAGE_SIZE pages. those zspages, at least at the beginning, will
> store only one object per-zspage; which will blastoff the internal
> fragmentation and can cause more compaction/migration/etc later on. well,
> it's a bit pessimistic, but I think to _some extent_ this scenario is
> quite possible.
> 
> I assume that this "pick an already marked for release zspage" thing is
> happening as a fast path within the first class ->lock section, so the
> rest of concurrent write requests that are spinning on the class ->lock
> at the moment will see a zspage, instead of !find_get_zspage().

As well, we would reduce page alloc/free cost although it's not expensive
compared to comp overhead. :)

Thanks for giving the thought!

Re: [RFC PATCH 1/2] Input: rotary-encoder- Add support for absolute encoder

2016-05-24 Thread Uwe Kleine-König

Hello,

On Tue, May 24, 2016 at 10:39:26AM +0530, Vignesh R wrote:
> On 05/23/2016 06:48 PM, Uwe Kleine-König wrote:
> > On Mon, May 23, 2016 at 04:48:40PM +0530, R, Vignesh wrote:
> >> On 5/22/2016 3:56 PM, Uwe Kleine-König wrote:
> >>> On Thu, May 19, 2016 at 02:34:00PM +0530, Vignesh R wrote:
>  +- rotary-encoder,absolute-encoder: support encoders where GPIO lines
>  +  reflect the actual position of the rotary encoder dial. For example,
>  +  if dial points to 9, then four GPIO lines read HLLH(1001b = 9).
>  +  In this case, rotary-encoder,steps-per-period needed not be defined.
> >>>
> >>> IMHO this is wrong, I'd formalize this device as:
> >>>
> >>>   {
> >>>   compatible = "rotary-encoder";
> >>>   gpios = <&gpio 19 1>, <&gpio 20 0>, <...>, <...>;
> >>>   rotary-encoder,encoding = "binary";
> >>>   rotary-encoder,steps = <16>;
> >>>   rotary-encoder,steps-per-period = <16>;
> >>
> >> The above bindings essential means quarter_period device. I would not
> >> like to bother with all the logic in rotary_encoder_quarter_period_irq()
> >> when we can know encoder->pos by directly reading state of gpio lines.
> > 
> > OK, we have code that is more complex than it needs to be for your
> > device. But your device is a special case of the supported devices, so
> > I'd say don't bother that there is more logic in the driver than you
> > need and be lucky.
> 
> More complexity is just a overhead. Since, encoder can be turned at a
> rate faster than handling of IRQs (rotary_encoder_quarter_period_irq()
> is threaded IRQ hence, priority is not close to real time), some states

This problem isn't unique to your hardware. An "ordinary" encoder with
just two GPIOs and more than one period can be rotated faster than
1/irq_latency, too. There are two things that can be done:

 - undo the conversion to threaded irqs; or
 - read out the gpios in the fast handler and only delay decoding and
   reporting of the event

Both approaches have their disadvantages.

> can be missed. rotary_encoder_quarter_period_irq() is not robust in this
> case, reading gpios directly is more suitable option. I see similar
> views expressed in previously[1]
> 
> [1]http://lists.infradead.org/pipermail/linux-arm-kernel/2015-December/391196.html

IMHO the right thing to do is to improve
rotary_encoder_quarter_period_irq (and also the other handlers for full
and half period mode) to make use of additional GPIOs. This way all
types of devices benefit and more code is shared.

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |

[PATCH V3 1/2] usb: musb: Ensure rx reinit occurs for shared_fifo endpoints

2016-05-24 Thread Andrew Goodbody

shared_fifo endpoints would only get a previous tx state cleared
out, the rx state was only cleared for non shared_fifo endpoints
Change this so that the rx state is cleared for all endpoints.
This addresses an issue that resulted in rx packets being dropped
silently.

Signed-off-by: Andrew Goodbody 
Cc: sta...@vger.kernel.org
---
V3 no change
V2 removed debugging call

 drivers/usb/musb/musb_host.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c
index b7a02ce..e5b6aba 100644
--- a/drivers/usb/musb/musb_host.c
+++ b/drivers/usb/musb/musb_host.c
@@ -594,14 +594,13 @@ musb_rx_reinit(struct musb *musb, struct musb_qh *qh, u8 
epnum)
musb_writew(ep->regs, MUSB_TXCSR, 0);
 
/* scrub all previous state, clearing toggle */
-   } else {
-   csr = musb_readw(ep->regs, MUSB_RXCSR);
-   if (csr & MUSB_RXCSR_RXPKTRDY)
-   WARNING("rx%d, packet/%d ready?\n", ep->epnum,
-   musb_readw(ep->regs, MUSB_RXCOUNT));
-
-   musb_h_flush_rxfifo(ep, MUSB_RXCSR_CLRDATATOG);
}
+   csr = musb_readw(ep->regs, MUSB_RXCSR);
+   if (csr & MUSB_RXCSR_RXPKTRDY)
+   WARNING("rx%d, packet/%d ready?\n", ep->epnum,
+   musb_readw(ep->regs, MUSB_RXCOUNT));
+
+   musb_h_flush_rxfifo(ep, MUSB_RXCSR_CLRDATATOG);
 
/* target addr and (for multipoint) hub addr/port */
if (musb->is_multipoint) {
-- 
2.7.4

[PATCH V3 2/2] usb: musb: Stop bulk endpoint while queue is rotated

2016-05-24 Thread Andrew Goodbody

Ensure that the endpoint is stopped by clearing REQPKT before
clearing DATAERR_NAKTIMEOUT before rotating the queue on the
dedicated bulk endpoint.
This addresses an issue where a race could result in the endpoint
receiving data before it was reprogrammed resulting in a warning
about such data from musb_rx_reinit before it was thrown away.
The data thrown away was a valid packet that had been correctly
ACKed which meant the host and device got out of sync.

Signed-off-by: Andrew Goodbody 
Cc: sta...@vger.kernel.org
---
V3 removed the old comment, moved the new comment in place of the old one
   and updated it to better reference the programmers guide
V2 added comment about clearing REQPKT before DATAERR_NAKTIMEOUT

 drivers/usb/musb/musb_host.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c
index e5b6aba..17421d0 100644
--- a/drivers/usb/musb/musb_host.c
+++ b/drivers/usb/musb/musb_host.c
@@ -994,9 +994,15 @@ static void musb_bulk_nak_timeout(struct musb *musb, 
struct musb_hw_ep *ep,
if (is_in) {
dma = is_dma_capable() ? ep->rx_channel : NULL;
 
-   /* clear nak timeout bit */
+   /*
+* Need to stop the transaction by clearing REQPKT first
+* then the NAK Timeout bit ref MUSBMHDRC USB 2.0 HIGH-SPEED
+* DUAL-ROLE CONTROLLER Programmer's Guide, section 9.2.2
+*/
rx_csr = musb_readw(epio, MUSB_RXCSR);
rx_csr |= MUSB_RXCSR_H_WZC_BITS;
+   rx_csr &= ~MUSB_RXCSR_H_REQPKT;
+   musb_writew(epio, MUSB_RXCSR, rx_csr);
rx_csr &= ~MUSB_RXCSR_DATAERROR;
musb_writew(epio, MUSB_RXCSR, rx_csr);
 
-- 
2.7.4

[PATCH V3 0/2] usb: musb: fix dropped packets

2016-05-24 Thread Andrew Goodbody

The musb driver can drop rx packets when heavily loaded. These two
patches address two issues that can cause this. Both issues arose
when an endpoint was reprogrammed. The first patch is a logic bug
that resulted in a shared_fifo in rx mode not having its state
cleared out. The second patch fixes a race condition caused by
not stopping the dedicated endpoint for bulk packets before
rotating its queue which allowed a packet to be recieved and then
thrown away.

V3 Updated the comment to better reference the manual
V2 added a comment and removed debugging code

Andrew Goodbody (2):
  usb: musb: Ensure rx reinit occurs for shared_fifo endpoints
  usb: musb: Stop bulk endpoint while queue is rotated

 drivers/usb/musb/musb_host.c | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

-- 
2.7.4

Re: [PATCH v3 2/9] powerpc/powernv: Rename idle_power7.S to idle_power_common.S

2016-05-24 Thread Gautham R Shenoy

On Mon, May 23, 2016 at 08:48:35PM +0530, Shreyas B. Prabhu wrote:
> idle_power7.S handles idle entry/exit for POWER7, POWER8 and in next
> patch for POWER9. Rename the file to a non-hardware specific
> name.
> 
> Signed-off-by: Shreyas B. Prabhu 

Reviewed-by: Gautham R. Shenoy

Re: [PATCH] drm/rockchip: Return -EBUSY if there's already a pending flip event v5

2016-05-24 Thread Heiko Stuebner

Hi Tomeu,

Patch subject: please put the version into the brackets, so [PATCH v5] as it 
shouldn't be part of the commit log.

Am Dienstag, 24. Mai 2016, 09:27:37 schrieb Tomeu Vizoso:
> As per the docs, atomic_commit should return -EBUSY "if an asycnhronous
> updated is requested and there is an earlier updated pending".

> v2: Use the status of the workqueue instead of vop->event, and don't add
> a superfluous wait on the workqueue.
> 
> v3: Drop work_busy, as there's a sizeable delay when the worker
> finishes, which introduces a race in which the client has already
> received the last flip event but the next page flip ioctl will still
> return -EBUSY because work_busy returns outdated information.
> 
> v4: Hold dev->event_lock while checking the VOP's event field as
> suggested by Daniel Stone.
> 
> v5: Only block if there's outstanding work if it's a blocking call.

similarly, please put the changelog below the "---" and above the diffstat.


> Signed-off-by: Tomeu Vizoso 
> ---

aka here.

>  drivers/gpu/drm/rockchip/rockchip_drm_drv.h |  1 +
>  drivers/gpu/drm/rockchip/rockchip_drm_fb.c  | 25
> ++--- drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 
> 6 ++
>  3 files changed, 29 insertions(+), 3 deletions(-)

Heiko

Re: [PATCH] drm/rockchip: Return -EBUSY if there's already a pending flip event v5

2016-05-24 Thread Daniel Vetter

On Tue, May 24, 2016 at 10:28:42AM +0200, Heiko Stuebner wrote:
> Hi Tomeu,
> 
> Patch subject: please put the version into the brackets, so [PATCH v5] as it 
> shouldn't be part of the commit log.
> 
> Am Dienstag, 24. Mai 2016, 09:27:37 schrieb Tomeu Vizoso:
> > As per the docs, atomic_commit should return -EBUSY "if an asycnhronous
> > updated is requested and there is an earlier updated pending".
> 
> > v2: Use the status of the workqueue instead of vop->event, and don't add
> > a superfluous wait on the workqueue.
> > 
> > v3: Drop work_busy, as there's a sizeable delay when the worker
> > finishes, which introduces a race in which the client has already
> > received the last flip event but the next page flip ioctl will still
> > return -EBUSY because work_busy returns outdated information.
> > 
> > v4: Hold dev->event_lock while checking the VOP's event field as
> > suggested by Daniel Stone.
> > 
> > v5: Only block if there's outstanding work if it's a blocking call.
> 
> similarly, please put the changelog below the "---" and above the diffstat.

drm culture is to keep it above, since it's kinda useful sometimes when
later on trying to reconstruct wtf was discussed and why a patch was
merged.
-Daniel

> 
> 
> > Signed-off-by: Tomeu Vizoso 
> > ---
> 
> aka here.
> 
> >  drivers/gpu/drm/rockchip/rockchip_drm_drv.h |  1 +
> >  drivers/gpu/drm/rockchip/rockchip_drm_fb.c  | 25
> > ++--- drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 
> > 6 ++
> >  3 files changed, 29 insertions(+), 3 deletions(-)
> 
> Heiko
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH v3 3/9] powerpc/powernv: Rename reusable idle functions to hardware agnostic names

2016-05-24 Thread Gautham R Shenoy

On Mon, May 23, 2016 at 08:48:36PM +0530, Shreyas B. Prabhu wrote:
> Functions like power7_wakeup_loss, power7_wakeup_noloss,
> power7_wakeup_tb_loss are used by POWER7 and POWER8 hardware. They can
> also be used by POWER9. Hence rename these functions hardware agnostic
> names.
> 
> Suggested-by: Gautham R. Shenoy 
> Signed-off-by: Shreyas B. Prabhu 
> ---
> New in v3
> 
>  arch/powerpc/kernel/exceptions-64s.S|  6 +++---
>  arch/powerpc/kernel/idle_power_common.S | 16 
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |  4 ++--
>  3 files changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index 4a74d6a..a0da627 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -108,7 +108,7 @@ BEGIN_FTR_SECTION
> 
>   cmpwi   cr3,r13,2
>   GET_PACA(r13)
> - bl  power7_restore_hyp_resource
> + bl  pnv_restore_hyp_resource
> 
>   li  r0,PNV_THREAD_RUNNING
>   stb r0,PACA_THREAD_IDLE_STATE(r13)  /* Clear thread state */
> @@ -128,8 +128,8 @@ BEGIN_FTR_SECTION
>   /* Return SRR1 from power7_nap() */
>   mfspr   r3,SPRN_SRR1
>   blt cr3,2f
> - b   power7_wakeup_loss
> -2:   b   power7_wakeup_noloss
> + b   pnv_wakeup_loss
> +2:   b   pnv_wakeup_noloss
> 
>  9:
>  END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
> diff --git a/arch/powerpc/kernel/idle_power_common.S 
> b/arch/powerpc/kernel/idle_power_common.S
> index db59613..973c9a1 100644
> --- a/arch/powerpc/kernel/idle_power_common.S
> +++ b/arch/powerpc/kernel/idle_power_common.S

The comment at the beginning of idle_power_common.S still reads

"This file contains the power_save function for Power7 CPUs."

Please update that as well.

Reviewed-by: Gautham R. Shenoy

Re: [PATCH] Input: pwm-beeper - fix: scheduling while atomic

2016-05-24 Thread Manfred Schlaegl

On 2016-05-20 18:59, Dmitry Torokhov wrote:
> Hi Manfred,
> 
> On Wed, May 18, 2016 at 05:16:49PM +0200, Manfred Schlaegl wrote:
>> @@ -133,6 +149,8 @@ static int pwm_beeper_remove(struct platform_device 
>> *pdev)
>>  {
>>  struct pwm_beeper *beeper = platform_get_drvdata(pdev);
>>  
>> +cancel_work_sync(&beeper->work);
>> +
>>  input_unregister_device(beeper->input);
> 
> This is racy, request to play may come in after cancel_work_sync()
> returns but before we unregistered input device. I think you want the
> version below.
> 

Hi Dmitry,

yes you are right. Thank you for your feedback.
I also see that point, but I think it would be a simpler change just
to cancel the worker after unregistering the device (to reorder 
cancel_work_sync and input_unregister_device).

Patch will follow shortly.

What do you think?

Sincerely,
Manfred

Re: [PATCH] hwrng: stm32 - fix build warning

2016-05-24 Thread Arnd Bergmann

On Tuesday, May 24, 2016 9:59:41 AM CEST Maxime Coquelin wrote:
> 2016-05-23 22:35 GMT+02:00 Arnd Bergmann :
> > On Monday, May 23, 2016 6:14:08 PM CEST Sudip Mukherjee wrote:
> >> diff --git a/drivers/char/hw_random/stm32-rng.c 
> >> b/drivers/char/hw_random/stm32-rng.c
> >> index 92a8106..0533370 100644
> >> --- a/drivers/char/hw_random/stm32-rng.c
> >> +++ b/drivers/char/hw_random/stm32-rng.c
> >> @@ -52,7 +52,7 @@ static int stm32_rng_read(struct hwrng *rng, void *data, 
> >> size_t max, bool wait)
> >>  {
> >>   struct stm32_rng_private *priv =
> >>   container_of(rng, struct stm32_rng_private, rng);
> >> - u32 sr;
> >> + u32 sr = 0;
> >>   int retval = 0;
> >>
> >>   pm_runtime_get_sync((struct device *) priv->rng.priv);
> >
> > Does this work as well?
> >
> > diff --git a/drivers/char/hw_random/stm32-rng.c 
> > b/drivers/char/hw_random/stm32-rng.c
> > index 92a810648bd0..5c836b0afa40 100644
> > --- a/drivers/char/hw_random/stm32-rng.c
> > +++ b/drivers/char/hw_random/stm32-rng.c
> > @@ -79,7 +79,7 @@ static int stm32_rng_read(struct hwrng *rng, void *data, 
> > size_t max, bool wait)
> > max -= sizeof(u32);
> > }
> >
> > -   if (WARN_ONCE(sr & (RNG_SR_SEIS | RNG_SR_CEIS),
> > +   if (WARN_ONCE(retval > 0 && (sr & (RNG_SR_SEIS | RNG_SR_CEIS)),
> >   "bad RNG status - %x\n", sr))
> > writel_relaxed(0, priv->base + RNG_SR);
> >
> > I think it would be nicer to not add a bogus initialization.
> Hmm, no sure this nicer.
> The while loop can break before retval is incremented when sr value is
> not expected (sr != RNG_SR_DRDY).
> In that case, we certainly want to print sr value.

Ah, you are right.

> Maybe the better way is just to initialize sr with status register content?

>pm_runtime_get_sync((struct device *) priv->rng.priv);
>
>+   sr = readl_relaxed(priv->base + RNG_SR);
>while (max > sizeof(u32)) {
>-   sr = readl_relaxed(priv->base + RNG_SR);
>if (!sr && wait) {
>unsigned int timeout = RNG_TIMEOUT;


I think that introduces a bug: you really want to read the status
register on each loop iteration.

How about moving the error handling into the loop itself?

Arnd


diff --git a/drivers/char/hw_random/stm32-rng.c 
b/drivers/char/hw_random/stm32-rng.c
index 92a810648bd0..fceacd809462 100644
--- a/drivers/char/hw_random/stm32-rng.c
+++ b/drivers/char/hw_random/stm32-rng.c
@@ -59,6 +59,10 @@ static int stm32_rng_read(struct hwrng *rng, void *data, 
size_t max, bool wait)
 
while (max > sizeof(u32)) {
sr = readl_relaxed(priv->base + RNG_SR);
+   if (WARN_ONCE(sr & (RNG_SR_SEIS | RNG_SR_CEIS),
+ "bad RNG status - %x\n", sr))
+   writel_relaxed(0, priv->base + RNG_SR);
+
if (!sr && wait) {
unsigned int timeout = RNG_TIMEOUT;
 
@@ -79,10 +83,6 @@ static int stm32_rng_read(struct hwrng *rng, void *data, 
size_t max, bool wait)
max -= sizeof(u32);
}
 
-   if (WARN_ONCE(sr & (RNG_SR_SEIS | RNG_SR_CEIS),
- "bad RNG status - %x\n", sr))
-   writel_relaxed(0, priv->base + RNG_SR);
-
pm_runtime_mark_last_busy((struct device *) priv->rng.priv);
pm_runtime_put_sync_autosuspend((struct device *) priv->rng.priv);

Re: [PATCH 2/4] ARM: dma-mapping: Constify attrs passed to internal functions

2016-05-24 Thread Russell King - ARM Linux

On Tue, May 24, 2016 at 08:28:08AM +0200, Krzysztof Kozlowski wrote:
> Some of the non-exported functions do not modify passed dma_attrs so the
> pointer can point to const data.
> 
> Signed-off-by: Krzysztof Kozlowski 

Acked-by: Russell King 

Thanks.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Re: [PATCH] Input: pwm-beeper - fix: scheduling while atomic

2016-05-24 Thread Manfred Schlaegl

Pwm config may sleep so defer it using a worker.

On a Freescale i.MX53 based board we ran into "BUG: scheduling while
atomic" because input_inject_event locks interrupts, but
imx_pwm_config_v2 sleeps.

Tested on Freescale i.MX53 SoC with 4.6.0 and 4.1.24.

Signed-off-by: Manfred Schlaegl 
---
 drivers/input/misc/pwm-beeper.c | 54 +++--
 1 file changed, 36 insertions(+), 18 deletions(-)

diff --git a/drivers/input/misc/pwm-beeper.c b/drivers/input/misc/pwm-beeper.c
index f2261ab..014495d3 100644
--- a/drivers/input/misc/pwm-beeper.c
+++ b/drivers/input/misc/pwm-beeper.c
@@ -20,21 +20,41 @@
 #include 
 #include 
 #include 
+#include 
 
 struct pwm_beeper {
struct input_dev *input;
struct pwm_device *pwm;
+   struct work_struct work;
unsigned long period;
 };
 
 #define HZ_TO_NANOSECONDS(x) (10UL/(x))
 
+static void __pwm_beeper_set(struct pwm_beeper *beeper)
+{
+   unsigned long period = beeper->period;
+
+   pwm_config(beeper->pwm, period / 2, period);
+
+   if (period == 0)
+   pwm_disable(beeper->pwm);
+   else
+   pwm_enable(beeper->pwm);
+}
+
+static void pwm_beeper_work(struct work_struct *work)
+{
+   struct pwm_beeper *beeper =
+   container_of(work, struct pwm_beeper, work);
+
+   __pwm_beeper_set(beeper);
+}
+
 static int pwm_beeper_event(struct input_dev *input,
unsigned int type, unsigned int code, int value)
 {
-   int ret = 0;
struct pwm_beeper *beeper = input_get_drvdata(input);
-   unsigned long period;
 
if (type != EV_SND || value < 0)
return -EINVAL;
@@ -49,18 +69,12 @@ static int pwm_beeper_event(struct input_dev *input,
return -EINVAL;
}
 
-   if (value == 0) {
-   pwm_disable(beeper->pwm);
-   } else {
-   period = HZ_TO_NANOSECONDS(value);
-   ret = pwm_config(beeper->pwm, period / 2, period);
-   if (ret)
-   return ret;
-   ret = pwm_enable(beeper->pwm);
-   if (ret)
-   return ret;
-   beeper->period = period;
-   }
+   if (value == 0)
+   beeper->period = 0;
+   else
+   beeper->period = HZ_TO_NANOSECONDS(value);
+
+   schedule_work(&beeper->work);
 
return 0;
 }
@@ -87,6 +101,8 @@ static int pwm_beeper_probe(struct platform_device *pdev)
goto err_free;
}
 
+   INIT_WORK(&beeper->work, pwm_beeper_work);
+
beeper->input = input_allocate_device();
if (!beeper->input) {
dev_err(&pdev->dev, "Failed to allocate input device\n");
@@ -135,6 +151,8 @@ static int pwm_beeper_remove(struct platform_device *pdev)
 
input_unregister_device(beeper->input);
 
+   cancel_work_sync(&beeper->work);
+
pwm_disable(beeper->pwm);
pwm_free(beeper->pwm);
 
@@ -147,6 +165,8 @@ static int __maybe_unused pwm_beeper_suspend(struct device 
*dev)
 {
struct pwm_beeper *beeper = dev_get_drvdata(dev);
 
+   cancel_work_sync(&beeper->work);
+
if (beeper->period)
pwm_disable(beeper->pwm);
 
@@ -157,10 +177,8 @@ static int __maybe_unused pwm_beeper_resume(struct device 
*dev)
 {
struct pwm_beeper *beeper = dev_get_drvdata(dev);
 
-   if (beeper->period) {
-   pwm_config(beeper->pwm, beeper->period / 2, beeper->period);
-   pwm_enable(beeper->pwm);
-   }
+   if (beeper->period)
+   __pwm_beeper_set(beeper);
 
return 0;
 }
-- 
2.1.4

Re: [PATCH] drm/rockchip: Return -EBUSY if there's already a pending flip event v5

2016-05-24 Thread Daniel Vetter

On Tue, May 24, 2016 at 10:30:50AM +0200, Daniel Vetter wrote:
> On Tue, May 24, 2016 at 10:28:42AM +0200, Heiko Stuebner wrote:
> > Hi Tomeu,
> > 
> > Patch subject: please put the version into the brackets, so [PATCH v5] as 
> > it 
> > shouldn't be part of the commit log.
> > 
> > Am Dienstag, 24. Mai 2016, 09:27:37 schrieb Tomeu Vizoso:
> > > As per the docs, atomic_commit should return -EBUSY "if an asycnhronous
> > > updated is requested and there is an earlier updated pending".
> > 
> > > v2: Use the status of the workqueue instead of vop->event, and don't add
> > > a superfluous wait on the workqueue.
> > > 
> > > v3: Drop work_busy, as there's a sizeable delay when the worker
> > > finishes, which introduces a race in which the client has already
> > > received the last flip event but the next page flip ioctl will still
> > > return -EBUSY because work_busy returns outdated information.
> > > 
> > > v4: Hold dev->event_lock while checking the VOP's event field as
> > > suggested by Daniel Stone.
> > > 
> > > v5: Only block if there's outstanding work if it's a blocking call.
> > 
> > similarly, please put the changelog below the "---" and above the diffstat.
> 
> drm culture is to keep it above, since it's kinda useful sometimes when
> later on trying to reconstruct wtf was discussed and why a patch was
> merged.

Maybe needs a bit more context: The only stuff you raised in your review
is tiny style nits of pretty much utter irrelevance. No substantial and
material feedback anywehere, and in my opinion in such a case either fix
up the nits when applying (when you feel really strongly about perfect
patches), or just merge as-is.

But sending out content-less bikesheds like these just adds noise and
helps no-one. I think at least some spelling stuff is the minimal bar (but
then just include your r-b tag), but personally I don't even care about
that so much, as long as it's still legible.

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH v3] Coccinelle: noderef: Add new rules and correct the old rule

2016-05-24 Thread Julia Lawall

Acked-by: Julia Lawall 

On Tue, 24 May 2016, Vaishali Thakkar wrote:

> Add new rules to detect the cases where sizeof is used in
> function calls as a argument.
>
> Also, for the patch mode third rule should behave same as
> second rule with arguments reversed. So, change that as well.
>
> Signed-off-by: Vaishali Thakkar 
> ---
> Changes since v2:
>   - Add rules for function calls. This will behave as
>   more general rules and covers cases which were
>   covered by the rule in previous versions of the patch
>   - Change subject and commit log accordingly.
> Changes since v1:
>   - Declare i as an expression instead of identifier to
> cover more cases
> ---
>  scripts/coccinelle/misc/noderef.cocci | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/scripts/coccinelle/misc/noderef.cocci 
> b/scripts/coccinelle/misc/noderef.cocci
> index 80a831c..007f0de 100644
> --- a/scripts/coccinelle/misc/noderef.cocci
> +++ b/scripts/coccinelle/misc/noderef.cocci
> @@ -16,6 +16,7 @@ virtual patch
>  @depends on patch@
>  expression *x;
>  expression f;
> +expression i;
>  type T;
>  @@
>
> @@ -30,15 +31,26 @@ f(...,(T)(x),...,sizeof(
>  + *x
> ),...)
>  |
> -f(...,sizeof(x),...,(T)(
> +f(...,sizeof(
> +- x
> ++ *x
> +   ),...,(T)(x),...)
> +|
> +f(...,(T)(x),...,i*sizeof(
>  - x
>  + *x
> ),...)
> +|
> +f(...,i*sizeof(
> +- x
> ++ *x
> +   ),...,(T)(x),...)
>  )
>
>  @r depends on !patch@
>  expression *x;
>  expression f;
> +expression i;
>  position p;
>  type T;
>  @@
> @@ -49,6 +61,10 @@ type T;
>  *f(...,(T)(x),...,sizeof@p(x),...)
>  |
>  *f(...,sizeof@p(x),...,(T)(x),...)
> +|
> +*f(...,(T)(x),...,i*sizeof@p(x),...)
> +|
> +*f(...,i*sizeof@p(x),...,(T)(x),...)
>  )
>
>  @script:python depends on org@
> --
> 2.1.4
>
>

Re: bpf: use-after-free in array_map_alloc

2016-05-24 Thread Vlastimil Babka


[+CC Marco who reported the CVE, forgot that earlier]

On 05/23/2016 11:35 PM, Tejun Heo wrote:

Hello,

Can you please test whether this patch resolves the issue?  While
adding support for atomic allocations, I reduced alloc_mutex covered
region too much.

Thanks.


Ugh, this makes the code even more head-spinning than it was.


diff --git a/mm/percpu.c b/mm/percpu.c
index 0c59684..bd2df70 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -162,7 +162,7 @@ static struct pcpu_chunk *pcpu_reserved_chunk;
  static int pcpu_reserved_chunk_limit;

  static DEFINE_SPINLOCK(pcpu_lock);/* all internal data structures */
-static DEFINE_MUTEX(pcpu_alloc_mutex); /* chunk create/destroy, [de]pop */
+static DEFINE_MUTEX(pcpu_alloc_mutex); /* chunk create/destroy, [de]pop, map 
extension */

  static struct list_head *pcpu_slot __read_mostly; /* chunk list slots */

@@ -435,6 +435,8 @@ static int pcpu_extend_area_map(struct pcpu_chunk *chunk, 
int new_alloc)
size_t old_size = 0, new_size = new_alloc * sizeof(new[0]);
unsigned long flags;

+   lockdep_assert_held(&pcpu_alloc_mutex);


I don't see where the mutex gets locked when called via 
pcpu_map_extend_workfn? (except via the new cancel_work_sync() call below?)


Also what protects chunks with scheduled work items from being removed?


+
new = pcpu_mem_zalloc(new_size);
if (!new)
return -ENOMEM;
@@ -895,6 +897,9 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, 
bool reserved,
return NULL;
}

+   if (!is_atomic)
+   mutex_lock(&pcpu_alloc_mutex);


BTW I noticed that
bool is_atomic = (gfp & GFP_KERNEL) != GFP_KERNEL;

this is too pessimistic IMHO. Reclaim is possible even without __GFP_FS 
and __GFP_IO. Could you just use gfpflags_allow_blocking(gfp) here?



+
spin_lock_irqsave(&pcpu_lock, flags);

/* serve reserved allocations from the reserved chunk if available */
@@ -967,12 +972,11 @@ static void __percpu *pcpu_alloc(size_t size, size_t 
align, bool reserved,
if (is_atomic)
goto fail;

-   mutex_lock(&pcpu_alloc_mutex);
+   lockdep_assert_held(&pcpu_alloc_mutex);

if (list_empty(&pcpu_slot[pcpu_nr_slots - 1])) {
chunk = pcpu_create_chunk();
if (!chunk) {
-   mutex_unlock(&pcpu_alloc_mutex);
err = "failed to allocate new chunk";
goto fail;
}
@@ -983,7 +987,6 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, 
bool reserved,
spin_lock_irqsave(&pcpu_lock, flags);
}

-   mutex_unlock(&pcpu_alloc_mutex);
goto restart;

  area_found:
@@ -993,8 +996,6 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, 
bool reserved,
if (!is_atomic) {
int page_start, page_end, rs, re;

-   mutex_lock(&pcpu_alloc_mutex);
-
page_start = PFN_DOWN(off);
page_end = PFN_UP(off + size);

@@ -1005,7 +1006,6 @@ static void __percpu *pcpu_alloc(size_t size, size_t 
align, bool reserved,

spin_lock_irqsave(&pcpu_lock, flags);
if (ret) {
-   mutex_unlock(&pcpu_alloc_mutex);
pcpu_free_area(chunk, off, &occ_pages);
err = "failed to populate";
goto fail_unlock;
@@ -1045,6 +1045,8 @@ static void __percpu *pcpu_alloc(size_t size, size_t 
align, bool reserved,
/* see the flag handling in pcpu_blance_workfn() */
pcpu_atomic_alloc_failed = true;
pcpu_schedule_balance_work();
+   } else {
+   mutex_unlock(&pcpu_alloc_mutex);
}
return NULL;
  }
@@ -1137,6 +1139,8 @@ static void pcpu_balance_workfn(struct work_struct *work)
list_for_each_entry_safe(chunk, next, &to_free, list) {
int rs, re;

+   cancel_work_sync(&chunk->map_extend_work);


This deserves some comment?


+
pcpu_for_each_pop_region(chunk, rs, re, 0, pcpu_unit_pages) {
pcpu_depopulate_chunk(chunk, rs, re);
spin_lock_irq(&pcpu_lock);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org";> em...@kvack.org

Re: [PATCH] drm/rockchip: Return -EBUSY if there's already a pending flip event v5

2016-05-24 Thread Heiko Stuebner

Am Dienstag, 24. Mai 2016, 10:37:49 schrieb Daniel Vetter:
> On Tue, May 24, 2016 at 10:30:50AM +0200, Daniel Vetter wrote:
> > On Tue, May 24, 2016 at 10:28:42AM +0200, Heiko Stuebner wrote:
> > > Hi Tomeu,
> > > 
> > > Patch subject: please put the version into the brackets, so [PATCH v5]
> > > as it shouldn't be part of the commit log.
> > > 
> > > Am Dienstag, 24. Mai 2016, 09:27:37 schrieb Tomeu Vizoso:
> > > > As per the docs, atomic_commit should return -EBUSY "if an
> > > > asycnhronous
> > > > updated is requested and there is an earlier updated pending".
> > > > 
> > > > v2: Use the status of the workqueue instead of vop->event, and don't
> > > > add
> > > > a superfluous wait on the workqueue.
> > > > 
> > > > v3: Drop work_busy, as there's a sizeable delay when the worker
> > > > finishes, which introduces a race in which the client has already
> > > > received the last flip event but the next page flip ioctl will still
> > > > return -EBUSY because work_busy returns outdated information.
> > > > 
> > > > v4: Hold dev->event_lock while checking the VOP's event field as
> > > > suggested by Daniel Stone.
> > > > 
> > > > v5: Only block if there's outstanding work if it's a blocking call.
> > > 
> > > similarly, please put the changelog below the "---" and above the
> > > diffstat.> 
> > drm culture is to keep it above, since it's kinda useful sometimes when
> > later on trying to reconstruct wtf was discussed and why a patch was
> > merged.
> 
> Maybe needs a bit more context: The only stuff you raised in your review
> is tiny style nits of pretty much utter irrelevance. No substantial and
> material feedback anywehere, and in my opinion in such a case either fix
> up the nits when applying (when you feel really strongly about perfect
> patches), or just merge as-is.
> 
> But sending out content-less bikesheds like these just adds noise and
> helps no-one. I think at least some spelling stuff is the minimal bar (but
> then just include your r-b tag), but personally I don't even care about
> that so much, as long as it's still legible.

ok, will keep that (both mails) in mind for future stuff.

Heiko

BUG: scheduling while atomic: cron/715/0x10cac0c0

2016-05-24 Thread Geert Uytterhoeven

Somewhere during this merge window, I started sometimes seeing the below during
shutdown of my Debian/m68k system running under ARAnyM:

BUG: scheduling while atomic: cron/715/0x10cac0c0
Modules linked in:
CPU: 0 PID: 715 Comm: cron Not tainted 4.6.0-atari-09955-g55db2ee398e5862f #338
Stack from 10cac074:
10cac074 0037ad52 0003d9b4 0036636e 00cd0814 02cb 10cac0c0 10cac0c0
002f45ba 00cd05c0  0082 00491068   002f42a8
000412c2   10cac0e0 002f47a4  7fff 10cac180
003b5490 00cd05ec 10cac1f0 10cac118 002f63a8  10cac174 10cac180
001ab966 008a6400 10cac130 021a 003bbba0 021a 021a 0002
efa44d50 10cac128 002f4806 7fff 0082 002f4c8a 002f4c9c 7fff
Call Trace: [<0003d9b4>] __schedule_bug+0x40/0x54
 [<002f45ba>] __schedule+0x312/0x388
 [<002f42a8>] __schedule+0x0/0x388
 [<000412c2>] prepare_to_wait+0x0/0x52
 [<002f47a4>] schedule+0x64/0x82
 [<002f63a8>] schedule_timeout+0xda/0x104
 [<001ab966>] __radix_tree_lookup+0x5a/0xa4
 [<002f4806>] io_schedule_timeout+0x36/0x4a
 [<002f4c8a>] bit_wait_io+0x0/0x40
 [<002f4c9c>] bit_wait_io+0x12/0x40
 [<002f493c>] __wait_on_bit+0x46/0x76
 [<0006a252>] wait_on_page_bit_killable+0x64/0x6c
 [<002f4c8a>] bit_wait_io+0x0/0x40
 [<000413e2>] wake_bit_function+0x0/0x4e
 [<0006a3a0>] __lock_page_or_retry+0xde/0x124
 [<0021a000>] scsi_scan_host+0xd6/0x196
 [<00098a82>] lookup_swap_cache+0x1e/0x48
 [<0008ba3e>] handle_mm_fault+0x626/0x7de
 [<0008f1de>] find_vma+0x0/0x66
 [<002f5c8a>] down_read+0x0/0xe
 [<0006a201>] wait_on_page_bit_killable+0x13/0x6c
 [<0008f1f4>] find_vma+0x16/0x66
 [<6c50>] do_page_fault+0xe6/0x23a
 [] res_func+0x930/0x141a
 [<5cc4>] buserr_c+0x190/0x6d4
 [] res_func+0x930/0x141a
 [<28f8>] buserr+0x20/0x28
 [] res_func+0x930/0x141a
 [<28f8>] buserr+0x20/0x28
 [] res_func+0x930/0x141a
 [<28f8>] buserr+0x20/0x28
 [] res_func+0x930/0x141a
 [<28f8>] buserr+0x20/0x28
 [] res_func+0x930/0x141a
 [<28f8>] buserr+0x20/0x28

Note that my tree does contain some local patches, so you cannot try the
exact same commit ID.

Unfortunately this isn't reproducible at will, so I cannot bisect it.
I saw it first after merging in commit 07be1337b9e8bfcd.

Anyone with a clue? Thanks!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH] mm: memcontrol: fix possible css ref leak on oom

2016-05-24 Thread Michal Hocko

On Tue 24-05-16 11:43:19, Vladimir Davydov wrote:
> On Mon, May 23, 2016 at 07:44:43PM +0200, Michal Hocko wrote:
> > On Mon 23-05-16 19:02:10, Vladimir Davydov wrote:
> > > mem_cgroup_oom may be invoked multiple times while a process is handling
> > > a page fault, in which case current->memcg_in_oom will be overwritten
> > > leaking the previously taken css reference.
> > 
> > Have you seen this happening? I was under impression that the page fault
> > paths that have oom enabled will not retry allocations.
> 
> filemap_fault will, for readahead.

I thought that the readahead is __GFP_NORETRY so we do not trigger OOM
killer.

> This is rather unlikely, just like the whole oom scenario, so I haven't
> faced this leak in production yet, although it's pretty easy to
> reproduce using a contrived test. However, even if this leak happened on
> my host, I would probably not notice, because currently we have no clear
> means of catching css leaks. I'm thinking about adding a file to debugfs
> containing brief information about all memory cgroups, including dead
> ones, so that we could at least see how many dead memory cgroups are
> dangling out there.

Yeah, debugfs interface would make some sense.
-- 
Michal Hocko
SUSE Labs

RE: [char-misc-next 1/2] mei: don't use wake_up_interruptible for wr_ctrl

2016-05-24 Thread Winkler, Tomas

> 
> On Mon, May 23, 2016 at 01:07:53PM +, Winkler, Tomas wrote:
> > >
> > > From: Alexander Usyskin 
> > >
> > > wr_ctrl waiters are none interruptible, so should be waken up with
> > > call to wake_up and not to wake_up_interruptible.
> > >
> > > This fixes commit:
> > > 7ff4bdd ("mei: fix waiting for wr_ctrl for corner cases.")
> > >
> > > Signed-off-by: Alexander Usyskin 
> > > Signed-off-by: Tomas Winkler 
> >
> > Hi Greg,
> > I see this fix didn't make it to rc1, so currently the  driver is somehow
> broken.
> > Can you please make an effort and include the fix in next rc?
> 
> Ugh, didn't realize it was totally broken, the patch didn't really say that, 
> sorry.
> I'll queue it up once 4.7-rc1 is out.
> 

There's a big difference between somhow broken and totally broken.
Just couldn't resist to parphrase Miracle Max :There's a big difference between 
mostly dead and all dead :)

Thanks
Tomas

Re: [PATCH] hwrng: stm32 - fix build warning

2016-05-24 Thread Maxime Coquelin

2016-05-24 10:32 GMT+02:00 Arnd Bergmann :
> On Tuesday, May 24, 2016 9:59:41 AM CEST Maxime Coquelin wrote:
>> 2016-05-23 22:35 GMT+02:00 Arnd Bergmann :
>> > On Monday, May 23, 2016 6:14:08 PM CEST Sudip Mukherjee wrote:
>> >> diff --git a/drivers/char/hw_random/stm32-rng.c 
>> >> b/drivers/char/hw_random/stm32-rng.c
>> >> index 92a8106..0533370 100644
>> >> --- a/drivers/char/hw_random/stm32-rng.c
>> >> +++ b/drivers/char/hw_random/stm32-rng.c
>> >> @@ -52,7 +52,7 @@ static int stm32_rng_read(struct hwrng *rng, void 
>> >> *data, size_t max, bool wait)
>> >>  {
>> >>   struct stm32_rng_private *priv =
>> >>   container_of(rng, struct stm32_rng_private, rng);
>> >> - u32 sr;
>> >> + u32 sr = 0;
>> >>   int retval = 0;
>> >>
>> >>   pm_runtime_get_sync((struct device *) priv->rng.priv);
>> >
>> > Does this work as well?
>> >
>> > diff --git a/drivers/char/hw_random/stm32-rng.c 
>> > b/drivers/char/hw_random/stm32-rng.c
>> > index 92a810648bd0..5c836b0afa40 100644
>> > --- a/drivers/char/hw_random/stm32-rng.c
>> > +++ b/drivers/char/hw_random/stm32-rng.c
>> > @@ -79,7 +79,7 @@ static int stm32_rng_read(struct hwrng *rng, void *data, 
>> > size_t max, bool wait)
>> > max -= sizeof(u32);
>> > }
>> >
>> > -   if (WARN_ONCE(sr & (RNG_SR_SEIS | RNG_SR_CEIS),
>> > +   if (WARN_ONCE(retval > 0 && (sr & (RNG_SR_SEIS | RNG_SR_CEIS)),
>> >   "bad RNG status - %x\n", sr))
>> > writel_relaxed(0, priv->base + RNG_SR);
>> >
>> > I think it would be nicer to not add a bogus initialization.
>> Hmm, no sure this nicer.
>> The while loop can break before retval is incremented when sr value is
>> not expected (sr != RNG_SR_DRDY).
>> In that case, we certainly want to print sr value.
>
> Ah, you are right.
>
>> Maybe the better way is just to initialize sr with status register content?
>
>>pm_runtime_get_sync((struct device *) priv->rng.priv);
>>
>>+   sr = readl_relaxed(priv->base + RNG_SR);
>>while (max > sizeof(u32)) {
>>-   sr = readl_relaxed(priv->base + RNG_SR);
>>if (!sr && wait) {
>>unsigned int timeout = RNG_TIMEOUT;
>
>
> I think that introduces a bug: you really want to read the status
> register on each loop iteration.
Actually, I read the status again at the end of the loop.
But my implementation isn't good anyway, because I read the status
register one time more every time.

>
> How about moving the error handling into the loop itself?
That would be better, indeed, but there is one problem with your below proposal:
>
> Arnd
>
>
> diff --git a/drivers/char/hw_random/stm32-rng.c 
> b/drivers/char/hw_random/stm32-rng.c
> index 92a810648bd0..fceacd809462 100644
> --- a/drivers/char/hw_random/stm32-rng.c
> +++ b/drivers/char/hw_random/stm32-rng.c
> @@ -59,6 +59,10 @@ static int stm32_rng_read(struct hwrng *rng, void *data, 
> size_t max, bool wait)
>
> while (max > sizeof(u32)) {
> sr = readl_relaxed(priv->base + RNG_SR);
> +   if (WARN_ONCE(sr & (RNG_SR_SEIS | RNG_SR_CEIS),
> + "bad RNG status - %x\n", sr))
> +   writel_relaxed(0, priv->base + RNG_SR);
> +
The error handling should be moved after the last status register read.

> if (!sr && wait) {
> unsigned int timeout = RNG_TIMEOUT;
>
> @@ -79,10 +83,6 @@ static int stm32_rng_read(struct hwrng *rng, void *data, 
> size_t max, bool wait)
> max -= sizeof(u32);
> }
>
> -   if (WARN_ONCE(sr & (RNG_SR_SEIS | RNG_SR_CEIS),
> - "bad RNG status - %x\n", sr))
> -   writel_relaxed(0, priv->base + RNG_SR);
> -
> pm_runtime_mark_last_busy((struct device *) priv->rng.priv);
> pm_runtime_put_sync_autosuspend((struct device *) priv->rng.priv);


diff --git a/drivers/char/hw_random/stm32-rng.c
b/drivers/char/hw_random/stm32-rng.c
index 92a810648bd0..2a0fc90e4dc3 100644
--- a/drivers/char/hw_random/stm32-rng.c
+++ b/drivers/char/hw_random/stm32-rng.c
@@ -68,6 +68,10 @@ static int stm32_rng_read(struct hwrng *rng, void
*data, size_t max, bool wait)
} while (!sr && --timeout);
}

+   if (WARN_ONCE(sr & (RNG_SR_SEIS | RNG_SR_CEIS),
+   "bad RNG status - %x\n", sr))
+   writel_relaxed(0, priv->base + RNG_SR);
+
/* If error detected or data not ready... */
if (sr != RNG_SR_DRDY)
break;
@@ -79,10 +83,6 @@ static int stm32_rng_read(struct hwrng *rng, void
*data, size_t max, bool wait)
max -= sizeof(u32);
}

-   if (WARN_ONCE(sr & (RNG_SR_SEIS | RNG_SR_CEIS),
- "bad RNG status - %x\n", sr))
-   writel_relaxed(0, priv->base + RNG_SR);
-
pm_runtime_mark_last_busy((struct device *) priv->rng.priv);

Re: [PATCH v3] Axi-usb: Add support for 64-bit addressing.

2016-05-24 Thread Arnd Bergmann

On Tuesday, May 24, 2016 10:51:08 AM CEST Nava kishore Manne wrote:
> diff --git a/Documentation/devicetree/bindings/usb/udc-xilinx.txt 
> b/Documentation/devicetree/bindings/usb/udc-xilinx.txt
> index 47b4e39..09df757 100644
> --- a/Documentation/devicetree/bindings/usb/udc-xilinx.txt
> +++ b/Documentation/devicetree/bindings/usb/udc-xilinx.txt
> @@ -1,18 +1,23 @@
>  Xilinx USB2 device controller
>  
>  Required properties:
> -- compatible : Should be "xlnx,usb2-device-4.00.a"
> +- compatible : Should be "xlnx,usb2-device-4.00.a" or
> +   "xlnx,usb2-device-5.00"
>  - reg: Physical base address and size of the USB2
> device registers map.
>  - interrupts : Should contain single irq line of USB2 device
> controller
>  - xlnx,has-builtin-dma   : if DMA is included
> +- dma-ranges : Should be as the following
> +   

A USB host should not have any children that are DMA capable, I think, so this
property doesn't make sense here. It should be part of the parent bus.

> +- xlnx,addrwidth : Should be the dma addressing size in bits(ex: 64 bits)

I'm still unconvinced about the property definition here. What are the possible
options for the IP block? I don't think I ever saw a reply from you to my 
earlier
questions.

> @@ -214,6 +223,20 @@ static const struct usb_endpoint_descriptor 
> config_bulk_out_desc = {
>   .wMaxPacketSize = cpu_to_le16(EP0_MAX_PACKET),
>  };
>  
> +/**
> + * xudc_write64 - write 64bit value to device registers
> + * @addr: base addr of device registers
> + * @offset: register offset
> + * @val: data to be written
> + **/
> +static void xudc_write64(struct xusb_ep *ep, u32 offset, u64 val)
> +{
> + struct xusb_udc *udc = ep->udc;
> +
> + udc->write_fn(udc->addr, offset, lower_32_bits(val));
> + udc->write_fn(udc->addr, offset+0x04, upper_32_bits(val));
> +}
> +
>  /**
>   * xudc_write32 - little endian write to device registers
>   * @addr: base addr of device registers
> @@ -330,8 +353,13 @@ static int xudc_start_dma(struct xusb_ep *ep, dma_addr_t 
> src,
>* destination registers and then set the length
>* into the DMA length register.
>*/
> - udc->write_fn(udc->addr, XUSB_DMA_DSAR_ADDR_OFFSET, src);
> - udc->write_fn(udc->addr, XUSB_DMA_DDAR_ADDR_OFFSET, dst);
> + if (udc->dma_addrwidth > 32) {
> + xudc_write64(ep, XUSB_DMA_DSAR_ADDR_OFFSET_LSB, src);
> + xudc_write64(ep, XUSB_DMA_DDAR_ADDR_OFFSET_LSB, dst);
> + } else {
> + udc->write_fn(udc->addr, XUSB_DMA_DSAR_ADDR_OFFSET, src);
> + udc->write_fn(udc->addr, XUSB_DMA_DDAR_ADDR_OFFSET, dst);
> + }
>   udc->write_fn(udc->addr, XUSB_DMA_LENGTH_OFFSET, length);
>  

This looks good.

Arnd

Re: [PATCH] f2fs: introduce on-disk layout version checking functionality

2016-05-24 Thread Christoph Hellwig

On Mon, May 23, 2016 at 01:08:05PM -0700, Viacheslav Dubeyko wrote:
> I think that it's some confusion. I didn't introduce any new fields in
> struct f2fs_super_block. The "major_ver" and "minor_ver" fields exist in
> F2FS superblock from the beginning of this file system implementation.
> The content of these two fields are defined during mkfs phase. The
> f2fs_format.c contains such code in f2fs_prepare_super_block():

They exists, but the kernel so far never checked them, and despite
that the feature checking works fine worth other f2fs features.

> Current version in VERSION file is 1.6.1. So, historically F2FS is using
> version of on-disk layout. The suggested patch simply introduces the
> threshold value F2FS_MAX_SUPP_MAJOR_VERSION with the purpose to refuse
> the mount operation for the case of unsupported version of on-disk
> layout.

While I've never seen an actual piece of documentation for the fields it
seems so far they just document the version of mkfs used to create
the file system.  Suddenly overloading them with semantics is just
going to create problems.

> First of all, it needs to distinguish two different points. First point,
> we need to increase the on-disk layout version because we are going to
> change on-disk layout in the way that old (current) driver will not
> support.

That's exactly what most file systems use feature flags for.

Re: [PATCH] f2fs: introduce on-disk layout version checking functionality

2016-05-24 Thread Christoph Hellwig

On Mon, May 23, 2016 at 02:13:57PM -0700, Jaegeuk Kim wrote:
> As Christoph mentioned, how about checking the feature only like this?
> 
> 1. if the feature is ON,
>  - go 64 bits   , when compiled w/  F2FS_MIN_16TB_VOLUME_SUPPORT
>  - fail to mount, when compiled w/o F2FS_MIN_16TB_VOLUME_SUPPORT
> 
> 2. if the feature is OFF,
>  - fail to mount, when compiled w/  F2FS_MIN_16TB_VOLUME_SUPPORT
>  - go 32 bits   , when compiled w/o F2FS_MIN_16TB_VOLUME_SUPPORT
> 
> Thoughts?

That goes on to the next question: why do we even need a config option
for 16TB+ volume support?

Re: [PATCH v3 4/9] powerpc/powernv: Make power7_powersave_common more generic

2016-05-24 Thread Gautham R Shenoy

Hi Shreyas,

On Mon, May 23, 2016 at 08:48:37PM +0530, Shreyas B. Prabhu wrote:
> power7_powersave_common does common steps needed before entering idle
> state and eventually changes MSR to MSR_IDLE and does rfid to
> power7_enter_nap_mode.
> 
> Move the updation of HSTATE_HWTHREAD_STATE to power7_powersave_common
> from power7_enter_nap_mode and make it more generic by passing the rfid
> address as a function parameter.
> 
> Also make function name more generic.
> 
> Reviewed-by: Gautham R. Shenoy 
> Signed-off-by: Shreyas B. Prabhu 
> ---
> Changes in v3:
> ==
>  - Moved HSTATE_HWTHREAD_STATE updation to power_powersave_common
> 
>  arch/powerpc/kernel/idle_power_common.S | 30 +-
>  1 file changed, 17 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/idle_power_common.S 
> b/arch/powerpc/kernel/idle_power_common.S
> index 973c9a1..d100577 100644
> --- a/arch/powerpc/kernel/idle_power_common.S
> +++ b/arch/powerpc/kernel/idle_power_common.S
> @@ -74,8 +74,10 @@ core_idle_lock_held:
>   * To check IRQ_HAPPENED in r4
>   *   0 - don't check
>   *   1 - check
> + *
> + * Address to 'rfid' to in r5
>   */
> -_GLOBAL(power7_powersave_common)
> +_GLOBAL(pnv_powersave_common)

You can move this rename to the previous patch where it fits better.

[..snip..]
> 
>   .globl  power7_enter_nap_mode
>  power7_enter_nap_mode:

Ditto. This should be "pnv_enter_idle_mode" in the previous patch.

[..snip..]
> 
>  _GLOBAL(power7_winkle)
>   li  r3,3

li  r3,PNV_THREAD_WINKLE 

Which should be a separate patch.

Re: [PATCH v3 5/9] powerpc/powernv: abstraction for saving SPRs before entering deep idle states

2016-05-24 Thread Gautham R Shenoy

On Mon, May 23, 2016 at 08:48:38PM +0530, Shreyas B. Prabhu wrote:
> Create a function for saving SPRs before entering deep idle states.
> This function can be reused for POWER9 deep idle states.
> 
> Signed-off-by: Shreyas B. Prabhu 

Reviewed-by: Gautham R. Shenoy

[RFC PATCH] sched: fix hierarchical order in rq->leaf_cfs_rq_list

2016-05-24 Thread Vincent Guittot

Fix the insertion of cfs_rq in rq->leaf_cfs_rq_list to ensure that
a child will always called before its parent.

The hierarchical order in shares update list has been introduced by
commit 67e86250f8ea ("sched: Introduce hierarchal order on shares update list")

With the current implementation a child can be still put after its parent.

Lets take the example of
   root
 \
  b
  /\
  c d*
|
e*

with root -> b -> c already enqueued but not d -> e so the leaf_cfs_rq_list
looks like: head -> c -> b -> root -> tail

The branch d -> e will be added the first time that they are enqueued,
starting with e then d.

When e is added, its parents is not already on the list so e is put at the
tail : head -> c -> b -> root -> e -> tail

Then, d is added at the head because its parent is already on the list:
head -> d -> c -> b -> root -> e -> tail

e is not placed at the right position and will be called the last whereas
it should be called at the beginning.

Because it follows the bottom-up enqueue sequence, we are sure that we
will finished to add either a cfs_rq without parent or a cfs_rq with a parent
that is already on the list. We can use this event to detect when we have
finished to add a new branch. For the others, whose parents are not already
added, we have to ensure that they will be added after their children that
have just been inserted the steps before, and after any potential parents that
are already in the list. The easiest way is to put the cfs_rq just after the
last inserted one and to keep track of it untl the branch is fully added.

Signed-off-by: Vincent Guittot 
---
 kernel/sched/core.c  |  1 +
 kernel/sched/fair.c  | 24 
 kernel/sched/sched.h |  1 +
 3 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index adcafda..ef97be4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7351,6 +7351,7 @@ void __init sched_init(void)
 #ifdef CONFIG_FAIR_GROUP_SCHED
root_task_group.shares = ROOT_TASK_GROUP_LOAD;
INIT_LIST_HEAD(&rq->leaf_cfs_rq_list);
+   rq->leaf_alone = &rq->leaf_cfs_rq_list;
/*
 * How much cpu bandwidth does root_task_group get?
 *
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b8a33ab..07f0f1b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -290,15 +290,31 @@ static inline void list_add_leaf_cfs_rq(struct cfs_rq 
*cfs_rq)
 * Ensure we either appear before our parent (if already
 * enqueued) or force our parent to appear after us when it is
 * enqueued.  The fact that we always enqueue bottom-up
-* reduces this to two cases.
+* reduces this to two cases and a special case for the root
+* cfs_rq.
 */
if (cfs_rq->tg->parent &&
cfs_rq->tg->parent->cfs_rq[cpu_of(rq_of(cfs_rq))]->on_list) 
{
-   list_add_rcu(&cfs_rq->leaf_cfs_rq_list,
-   &rq_of(cfs_rq)->leaf_cfs_rq_list);
-   } else {
+   /* Add the child just before its parent */
+   list_add_tail_rcu(&cfs_rq->leaf_cfs_rq_list,
+   
&(cfs_rq->tg->parent->cfs_rq[cpu_of(rq_of(cfs_rq))]->leaf_cfs_rq_list));
+   rq_of(cfs_rq)->leaf_alone = 
&rq_of(cfs_rq)->leaf_cfs_rq_list;
+   } else if (!cfs_rq->tg->parent) {
+   /*
+* cfs_rq without parent should be
+* at the end of the list
+*/
list_add_tail_rcu(&cfs_rq->leaf_cfs_rq_list,
&rq_of(cfs_rq)->leaf_cfs_rq_list);
+   rq_of(cfs_rq)->leaf_alone = 
&rq_of(cfs_rq)->leaf_cfs_rq_list;
+   } else {
+   /*
+* Our parent has not already been added so make sure
+* that it will be put after us
+*/
+   list_add_rcu(&cfs_rq->leaf_cfs_rq_list,
+   rq_of(cfs_rq)->leaf_alone);
+   rq_of(cfs_rq)->leaf_alone = &cfs_rq->leaf_cfs_rq_list;
}
 
cfs_rq->on_list = 1;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 69da6fc..9693fe9 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -607,6 +607,7 @@ struct rq {
 #ifdef CONFIG_FAIR_GROUP_SCHED
/* list of leaf cfs_rq on this cpu: */
struct list_head leaf_cfs_rq_list;
+   struct list_head *leaf_alone;
 #endif /* CONFIG_FAIR_GROUP_SCHED */
 
/*
-- 
1.9.1

[RFC PATCH v2] sched: reflect sched_entity movement into task_group's utilization

2016-05-24 Thread Vincent Guittot

Ensure that the changes of the utilization of a sched_entity will be
reflected in the task_group hierarchy.

This patch tries another way than the flat utilization hierarchy proposal
to ensure the changes will be propagated down to the root cfs.

The way to compute the sched average metrics stays the same so the
utilization only need to be synced with the local cfs rq timestamp.

Changes since v1:
- This patch needs the patch that fixes issue with rq->leaf_cfs_rq_list
  "sched: fix hierarchical order in rq->leaf_cfs_rq_list" in order to work
  correctly. I haven't sent them as a single patchset because the fix is
  independant of this one
- Merge some functions that are always used together
- During update of blocked load, ensure that the sched_entity is synced
  with the cfs_rq applying changes
- Fix an issue when task changes its cpu affinity

Signed-off-by: Vincent Guittot 
---
 kernel/sched/fair.c  | 168 ---
 kernel/sched/sched.h |   1 +
 2 files changed, 147 insertions(+), 22 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 07f0f1b..2714e31 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2591,6 +2591,7 @@ static void update_cfs_shares(struct cfs_rq *cfs_rq)
 
reweight_entity(cfs_rq_of(se), se, shares);
 }
+
 #else /* CONFIG_FAIR_GROUP_SCHED */
 static inline void update_cfs_shares(struct cfs_rq *cfs_rq)
 {
@@ -2817,6 +2818,28 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
return decayed;
 }
 
+#ifndef CONFIG_64BIT
+static inline u64 cfs_rq_last_update_time(struct cfs_rq *cfs_rq)
+{
+   u64 last_update_time_copy;
+   u64 last_update_time;
+
+   do {
+   last_update_time_copy = cfs_rq->load_last_update_time_copy;
+   smp_rmb();
+   last_update_time = cfs_rq->avg.last_update_time;
+   } while (last_update_time != last_update_time_copy);
+
+   return last_update_time;
+}
+#else
+static inline u64 cfs_rq_last_update_time(struct cfs_rq *cfs_rq)
+{
+   return cfs_rq->avg.last_update_time;
+}
+#endif
+
+
 #ifdef CONFIG_FAIR_GROUP_SCHED
 /*
  * Updating tg's load_avg is necessary before update_cfs_share (which is done)
@@ -2884,8 +2907,86 @@ void set_task_rq_fair(struct sched_entity *se,
se->avg.last_update_time = n_last_update_time;
}
 }
+
+/*
+ * Save how much utilization has just been added/removed on cfs rq so we can
+ * propagate it across the whole tg tree
+ */
+static void set_tg_cfs_rq_util(struct cfs_rq *cfs_rq, int delta)
+{
+   if (cfs_rq->tg == &root_task_group)
+   return;
+
+   cfs_rq->diff_util_avg += delta;
+}
+
+/* Take into account the change of the utilization of a child task group */
+static void update_tg_cfs_util(struct sched_entity *se, int blocked)
+{
+   int delta;
+   struct cfs_rq *cfs_rq;
+   long update_util_avg;
+   long last_update_time;
+   long old_util_avg;
+
+
+   /*
+* update_blocked_average will call this function for root cfs_rq
+* whose se is null. In this case just return
+*/
+   if (!se)
+   return;
+
+   if (entity_is_task(se))
+   return 0;
+
+   /* Get sched_entity of cfs rq */
+   cfs_rq = group_cfs_rq(se);
+
+   update_util_avg = cfs_rq->diff_util_avg;
+
+   if (!update_util_avg)
+   return 0;
+
+   /* Clear pending changes */
+   cfs_rq->diff_util_avg = 0;
+
+   /* Add changes in sched_entity utilizaton */
+   old_util_avg = se->avg.util_avg;
+   se->avg.util_avg = max_t(long, se->avg.util_avg + update_util_avg, 0);
+   se->avg.util_sum = se->avg.util_avg * LOAD_AVG_MAX;
+
+   /* Get parent cfs_rq */
+   cfs_rq = cfs_rq_of(se);
+
+   if (blocked) {
+   /*
+* blocked utilization has to be synchronized with its parent
+* cfs_rq's timestamp
+*/
+   last_update_time = cfs_rq_last_update_time(cfs_rq);
+
+   __update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)),
+ &se->avg,
+ se->on_rq * scale_load_down(se->load.weight),
+ cfs_rq->curr == se, NULL);
+   }
+
+   delta = se->avg.util_avg - old_util_avg;
+
+   cfs_rq->avg.util_avg =  max_t(long, cfs_rq->avg.util_avg + delta, 0);
+   cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * LOAD_AVG_MAX;
+
+   set_tg_cfs_rq_util(cfs_rq, delta);
+}
+
 #else /* CONFIG_FAIR_GROUP_SCHED */
 static inline void update_tg_load_avg(struct cfs_rq *cfs_rq, int force) {}
+
+static inline void set_tg_cfs_rq_util(struct cfs_rq *cfs_rq, int delta) {}
+
+static inline void update_tg_cfs_util(struct sched_entity *se, int sync) {}
+
 #endif /* CONFIG_FAIR_GROUP_SCHED */
 
 static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq);
@@ -2925,6 +3026,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, 
bool up

Re: [PATCH] hwrng: stm32 - fix build warning

2016-05-24 Thread Arnd Bergmann

On Tuesday, May 24, 2016 10:50:17 AM CEST Maxime Coquelin wrote:
> diff --git a/drivers/char/hw_random/stm32-rng.c
> b/drivers/char/hw_random/stm32-rng.c
> index 92a810648bd0..2a0fc90e4dc3 100644
> --- a/drivers/char/hw_random/stm32-rng.c
> +++ b/drivers/char/hw_random/stm32-rng.c
> @@ -68,6 +68,10 @@ static int stm32_rng_read(struct hwrng *rng, void
> *data, size_t max, bool wait)
> } while (!sr && --timeout);
> }
> 
> +   if (WARN_ONCE(sr & (RNG_SR_SEIS | RNG_SR_CEIS),
> +   "bad RNG status - %x\n", sr))
> +   writel_relaxed(0, priv->base + RNG_SR);
> +
> /* If error detected or data not ready... */
> if (sr != RNG_SR_DRDY)
> break;
> @@ -79,10 +83,6 @@ static int stm32_rng_read(struct hwrng *rng, void
> *data, size_t max, bool wait)
> max -= sizeof(u32);
> }
> 
> -   if (WARN_ONCE(sr & (RNG_SR_SEIS | RNG_SR_CEIS),
> - "bad RNG status - %x\n", sr))
> -   writel_relaxed(0, priv->base + RNG_SR);
> -
> pm_runtime_mark_last_busy((struct device *) priv->rng.priv);
> pm_runtime_put_sync_autosuspend((struct device *) priv->rng.priv);
> 
> Thanks,
> 

Yes, that looks good to me.

Arnd

Re: [PATCH] drm/rockchip: Return -EBUSY if there's already a pending flip event v5

2016-05-24 Thread Daniel Vetter

On Tue, May 24, 2016 at 10:41:30AM +0200, Heiko Stuebner wrote:
> Am Dienstag, 24. Mai 2016, 10:37:49 schrieb Daniel Vetter:
> > On Tue, May 24, 2016 at 10:30:50AM +0200, Daniel Vetter wrote:
> > > On Tue, May 24, 2016 at 10:28:42AM +0200, Heiko Stuebner wrote:
> > > > Hi Tomeu,
> > > > 
> > > > Patch subject: please put the version into the brackets, so [PATCH v5]
> > > > as it shouldn't be part of the commit log.
> > > > 
> > > > Am Dienstag, 24. Mai 2016, 09:27:37 schrieb Tomeu Vizoso:
> > > > > As per the docs, atomic_commit should return -EBUSY "if an
> > > > > asycnhronous
> > > > > updated is requested and there is an earlier updated pending".
> > > > > 
> > > > > v2: Use the status of the workqueue instead of vop->event, and don't
> > > > > add
> > > > > a superfluous wait on the workqueue.
> > > > > 
> > > > > v3: Drop work_busy, as there's a sizeable delay when the worker
> > > > > finishes, which introduces a race in which the client has already
> > > > > received the last flip event but the next page flip ioctl will still
> > > > > return -EBUSY because work_busy returns outdated information.
> > > > > 
> > > > > v4: Hold dev->event_lock while checking the VOP's event field as
> > > > > suggested by Daniel Stone.
> > > > > 
> > > > > v5: Only block if there's outstanding work if it's a blocking call.
> > > > 
> > > > similarly, please put the changelog below the "---" and above the
> > > > diffstat.> 
> > > drm culture is to keep it above, since it's kinda useful sometimes when
> > > later on trying to reconstruct wtf was discussed and why a patch was
> > > merged.
> > 
> > Maybe needs a bit more context: The only stuff you raised in your review
> > is tiny style nits of pretty much utter irrelevance. No substantial and
> > material feedback anywehere, and in my opinion in such a case either fix
> > up the nits when applying (when you feel really strongly about perfect
> > patches), or just merge as-is.
> > 
> > But sending out content-less bikesheds like these just adds noise and
> > helps no-one. I think at least some spelling stuff is the minimal bar (but
> > then just include your r-b tag), but personally I don't even care about
> > that so much, as long as it's still legible.
> 
> ok, will keep that (both mails) in mind for future stuff.

And to clarify, review of patches is very much appreciated, but to be
effective it should be top down. First assess whether it's a good idea,
then whether the implementation makes sense, then go down into style
naming and details like that. And it's important to tell the submitter
where they are in that process, too. More in-depth writeup of a good
review approach:

http://sarah.thesharps.us/2014/09/01/the-gentle-art-of-patch-review/

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [GIT PULL] EFI fix

2016-05-24 Thread Ingo Molnar

* Josh Poimboeuf  wrote:

> On Mon, May 23, 2016 at 01:08:01PM +0100, Matt Fleming wrote:
> > On Mon, 16 May, at 01:05:45PM, Linus Torvalds wrote:
> > > 
> > > I think the right fix is to just get rid of that silly conditional
> > > frame pointer thing, and always use frame pointers in this stub
> > > function. And then we don't need that (odd) load to get the old stack
> > > pointer into %rax - we can just use the frame pointer.
> > > 
> > > Something like the attached completely untested patch.
> > 
> > Linus, are you going to apply your patch directly or would you prefer
> > someone to send a pull request which also includes Josh's patch for
> > arch/x86/entry/thunk_64.S?
> 
> Ingo already merged both patches into tip:x86/urgent, so I'd presume
> he'll be sending a pull request for them soon.

Correct, I plan to send them later today.

Thanks,

Ingo

Re: [PATCHv3] support for AD5820 camera auto-focus coil

2016-05-24 Thread Pavel Machek

Hi!

> >+static int ad5820_registered(struct v4l2_subdev *subdev)
> >+{
> >+struct ad5820_device *coil = to_ad5820_device(subdev);
> >+struct i2c_client *client = v4l2_get_subdevdata(subdev);
> >+
> >+coil->vana = regulator_get(&client->dev, "VANA");
> 
> devm_regulator_get()?

I'd rather avoid devm_ here. Driver is simple enough to allow it.

> >+#define AD5820_RAMP_MODE_LINEAR (0 << 3)
> >+#define AD5820_RAMP_MODE_64_16  (1 << 3)
> >+
> >+struct ad5820_platform_data {
> >+int (*set_xshutdown)(struct v4l2_subdev *subdev, int set);
> >+};
> >+
> >+#define to_ad5820_device(sd)container_of(sd, struct ad5820_device, 
> >subdev)
> >+
> >+struct ad5820_device {
> >+struct v4l2_subdev subdev;
> >+struct ad5820_platform_data *platform_data;
> >+struct regulator *vana;
> >+
> >+struct v4l2_ctrl_handler ctrls;
> >+u32 focus_absolute;
> >+u32 focus_ramp_time;
> >+u32 focus_ramp_mode;
> >+
> >+struct mutex power_lock;
> >+int power_count;
> >+
> >+int standby : 1;
> >+};
> >+
> 
> The same for struct ad5820_device, is it really part of the public API?

Let me check what can be done with it.
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[PATCH RESEND 2/8] mm: clean up non-standard page->_mapcount users

2016-05-24 Thread Vladimir Davydov

 - Add a proper comment to page->_mapcount.
 - Introduce a macro for generating helper functions.
 - Place all special page->_mapcount values next to each other so that
   readers can see all possible values and so we don't get duplicates.

Signed-off-by: Vladimir Davydov 
---
 include/linux/mm_types.h   |  5 
 include/linux/page-flags.h | 73 --
 scripts/tags.sh|  3 ++
 3 files changed, 40 insertions(+), 41 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 3cc5977a9cab..16bdef7943e3 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -85,6 +85,11 @@ struct page {
/*
 * Count of ptes mapped in mms, to show when
 * page is mapped & limit reverse map searches.
+*
+* Extra information about page type may be
+* stored here for pages that are never mapped,
+* in which case the value MUST BE <= -2.
+* See page-flags.h for more details.
 */
atomic_t _mapcount;
 
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e5a32445f930..9940ade6a25e 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -593,54 +593,45 @@ TESTPAGEFLAG_FALSE(DoubleMap)
 #endif
 
 /*
- * PageBuddy() indicate that the page is free and in the buddy system
- * (see mm/page_alloc.c).
- *
- * PAGE_BUDDY_MAPCOUNT_VALUE must be <= -2 but better not too close to
- * -2 so that an underflow of the page_mapcount() won't be mistaken
- * for a genuine PAGE_BUDDY_MAPCOUNT_VALUE. -128 can be created very
- * efficiently by most CPU architectures.
+ * For pages that are never mapped to userspace, page->mapcount may be
+ * used for storing extra information about page type. Any value used
+ * for this purpose must be <= -2, but it's better start not too close
+ * to -2 so that an underflow of the page_mapcount() won't be mistaken
+ * for a special page.
  */
-#define PAGE_BUDDY_MAPCOUNT_VALUE (-128)
-
-static inline int PageBuddy(struct page *page)
-{
-   return atomic_read(&page->_mapcount) == PAGE_BUDDY_MAPCOUNT_VALUE;
+#define PAGE_MAPCOUNT_OPS(uname, lname)
\
+static __always_inline int Page##uname(struct page *page)  \
+{  \
+   return atomic_read(&page->_mapcount) == \
+   PAGE_##lname##_MAPCOUNT_VALUE;  \
+}  \
+static __always_inline void __SetPage##uname(struct page *page)
\
+{  \
+   VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);  \
+   atomic_set(&page->_mapcount, PAGE_##lname##_MAPCOUNT_VALUE);\
+}  \
+static __always_inline void __ClearPage##uname(struct page *page)  \
+{  \
+   VM_BUG_ON_PAGE(!Page##uname(page), page);   \
+   atomic_set(&page->_mapcount, -1);   \
 }
 
-static inline void __SetPageBuddy(struct page *page)
-{
-   VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
-   atomic_set(&page->_mapcount, PAGE_BUDDY_MAPCOUNT_VALUE);
-}
+/*
+ * PageBuddy() indicate that the page is free and in the buddy system
+ * (see mm/page_alloc.c).
+ */
+#define PAGE_BUDDY_MAPCOUNT_VALUE  (-128)
+PAGE_MAPCOUNT_OPS(Buddy, BUDDY)
 
-static inline void __ClearPageBuddy(struct page *page)
-{
-   VM_BUG_ON_PAGE(!PageBuddy(page), page);
-   atomic_set(&page->_mapcount, -1);
-}
+/*
+ * PageBalloon() is set on pages that are on the balloon page list
+ * (see mm/balloon_compaction.c).
+ */
+#define PAGE_BALLOON_MAPCOUNT_VALUE(-256)
+PAGE_MAPCOUNT_OPS(Balloon, BALLOON)
 
 extern bool is_free_buddy_page(struct page *page);
 
-#define PAGE_BALLOON_MAPCOUNT_VALUE (-256)
-
-static inline int PageBalloon(struct page *page)
-{
-   return atomic_read(&page->_mapcount) == PAGE_BALLOON_MAPCOUNT_VALUE;
-}
-
-static inline void __SetPageBalloon(struct page *page)
-{
-   VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
-   atomic_set(&page->_mapcount, PAGE_BALLOON_MAPCOUNT_VALUE);
-}
-
-static inline void __ClearPageBalloon(struct page *page)
-{
-   VM_BUG_ON_PAGE(!PageBalloon(page), page);
-   atomic_set(&page->_mapcount, -1);
-}
-
 /*
  * If network-based swap is enabled, sl*b must keep track of whether pages
  * were allocated from pfmemalloc reserves.
diff --git a/scrip

use default speed of the eMMC

2016-05-24 Thread 장민우

Dear chrisball.

Hello, I'm Minwoo Jang.

I have a question about using default speed of the eMMC.

When default speed is used, mmc_select_bus_width() is never called.

So, eMMC can not be set 4 bit or 8 bit bus width, I think.

Please, give me your opinions on the following diff codes.

Thank you.

=

diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index 94b4462..3b1cc4d 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -952,6 +952,8 @@ static int mmc_select_bus_width(struct mmc_card *card)

if (!err) {
err = bus_width;
+   pr_warn("%s: switch to bus width %d\n",
+   mmc_hostname(host), (1 << bus_width));
break;
} else {
pr_warn("%s: switch to bus width %d failed\n",
@@ -1500,6 +1502,14 @@ static int mmc_init_card(struct mmc_host *host, u32 ocr,
if (err)
goto err;
}
+   } else {
+   /* Select the bus width for normal speed mode */
+   err = mmc_select_bus_width(card);
+   if (IS_ERR_VALUE(err)) {
+   pr_warn("%s: Selecting bus width failed\n",
+   mmc_hostname(card->host));
+   goto err;
+   }
}

Best regards,

MW Jang.

[PATCH RESEND 0/8] More stuff to charge to kmemcg

2016-05-24 Thread Vladimir Davydov

[resending with all relevant lists in Cc]

Hi,

This patch implements per kmemcg accounting of page tables (x86-only),
pipe buffers, and unix socket buffers.

Basically, this is v2 of my earlier attempt [1], addressing comments by
Andrew, namely: lack of comments to non-standard _mapcount usage, extra
overhead even when kmemcg is unused, wrong handling of stolen pipe
buffer pages.

Patches 1-3 are just cleanups that are not supposed to introduce any
functional changes. Patches 4 and 5 move charge/uncharge to generic page
allocator paths for the sake of accounting pipe and unix socket buffers.
Patches 5-7 make x86 page tables, pipe buffers, and unix socket buffers
accountable.

[1] http://lkml.kernel.org/r/%3ccover.1443262808.git.vdavy...@parallels.com%3E

Thanks,

Vladimir Davydov (8):
  mm: remove pointless struct in struct page definition
  mm: clean up non-standard page->_mapcount users
  mm: memcontrol: cleanup kmem charge functions
  mm: charge/uncharge kmemcg from generic page allocator paths
  mm: memcontrol: teach uncharge_list to deal with kmem pages
  arch: x86: charge page tables to kmemcg
  pipe: account to kmemcg
  af_unix: charge buffers to kmemcg

 arch/x86/include/asm/pgalloc.h |  12 -
 arch/x86/mm/pgtable.c  |  11 ++--
 fs/pipe.c  |  32 ---
 include/linux/gfp.h|  10 +---
 include/linux/memcontrol.h | 103 +++-
 include/linux/mm_types.h   |  73 -
 include/linux/page-flags.h |  78 +--
 kernel/fork.c  |   6 +--
 mm/memcontrol.c| 117 -
 mm/page_alloc.c|  63 +-
 mm/slab.h  |  16 --
 mm/slab_common.c   |   2 +-
 mm/slub.c  |   6 +--
 mm/vmalloc.c   |   6 +--
 net/unix/af_unix.c |   1 +
 scripts/tags.sh|   3 ++
 16 files changed, 245 insertions(+), 294 deletions(-)

-- 
2.1.4

[PATCH RESEND 3/8] mm: memcontrol: cleanup kmem charge functions

2016-05-24 Thread Vladimir Davydov

 - Handle memcg_kmem_enabled check out to the caller. This reduces the
   number of function definitions making the code easier to follow. At
   the same time it doesn't result in code bloat, because all of these
   functions are used only in one or two places.
 - Move __GFP_ACCOUNT check to the caller as well so that one wouldn't
   have to dive deep into memcg implementation to see which allocations
   are charged and which are not.
 - Refresh comments.

Signed-off-by: Vladimir Davydov 
---
 include/linux/memcontrol.h | 103 +++--
 mm/memcontrol.c|  75 -
 mm/page_alloc.c|   9 ++--
 mm/slab.h  |  16 +--
 4 files changed, 80 insertions(+), 123 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a805474df4ab..2d03975c7dc0 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -754,6 +754,13 @@ static inline bool mem_cgroup_under_socket_pressure(struct 
mem_cgroup *memcg)
 }
 #endif
 
+struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep);
+void memcg_kmem_put_cache(struct kmem_cache *cachep);
+int memcg_kmem_charge_memcg(struct page *page, gfp_t gfp, int order,
+   struct mem_cgroup *memcg);
+int memcg_kmem_charge(struct page *page, gfp_t gfp, int order);
+void memcg_kmem_uncharge(struct page *page, int order);
+
 #if defined(CONFIG_MEMCG) && !defined(CONFIG_SLOB)
 extern struct static_key_false memcg_kmem_enabled_key;
 
@@ -775,22 +782,6 @@ static inline bool memcg_kmem_enabled(void)
 }
 
 /*
- * In general, we'll do everything in our power to not incur in any overhead
- * for non-memcg users for the kmem functions. Not even a function call, if we
- * can avoid it.
- *
- * Therefore, we'll inline all those functions so that in the best case, we'll
- * see that kmemcg is off for everybody and proceed quickly.  If it is on,
- * we'll still do most of the flag checking inline. We check a lot of
- * conditions, but because they are pretty simple, they are expected to be
- * fast.
- */
-int __memcg_kmem_charge_memcg(struct page *page, gfp_t gfp, int order,
- struct mem_cgroup *memcg);
-int __memcg_kmem_charge(struct page *page, gfp_t gfp, int order);
-void __memcg_kmem_uncharge(struct page *page, int order);
-
-/*
  * helper for accessing a memcg's index. It will be used as an index in the
  * child cache array in kmem_cache, and also to derive its name. This function
  * will return -1 when this is not a kmem-limited memcg.
@@ -800,67 +791,6 @@ static inline int memcg_cache_id(struct mem_cgroup *memcg)
return memcg ? memcg->kmemcg_id : -1;
 }
 
-struct kmem_cache *__memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t 
gfp);
-void __memcg_kmem_put_cache(struct kmem_cache *cachep);
-
-static inline bool __memcg_kmem_bypass(void)
-{
-   if (!memcg_kmem_enabled())
-   return true;
-   if (in_interrupt() || (!current->mm) || (current->flags & PF_KTHREAD))
-   return true;
-   return false;
-}
-
-/**
- * memcg_kmem_charge: charge a kmem page
- * @page: page to charge
- * @gfp: reclaim mode
- * @order: allocation order
- *
- * Returns 0 on success, an error code on failure.
- */
-static __always_inline int memcg_kmem_charge(struct page *page,
-gfp_t gfp, int order)
-{
-   if (__memcg_kmem_bypass())
-   return 0;
-   if (!(gfp & __GFP_ACCOUNT))
-   return 0;
-   return __memcg_kmem_charge(page, gfp, order);
-}
-
-/**
- * memcg_kmem_uncharge: uncharge a kmem page
- * @page: page to uncharge
- * @order: allocation order
- */
-static __always_inline void memcg_kmem_uncharge(struct page *page, int order)
-{
-   if (memcg_kmem_enabled())
-   __memcg_kmem_uncharge(page, order);
-}
-
-/**
- * memcg_kmem_get_cache: selects the correct per-memcg cache for allocation
- * @cachep: the original global kmem cache
- *
- * All memory allocated from a per-memcg cache is charged to the owner memcg.
- */
-static __always_inline struct kmem_cache *
-memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp)
-{
-   if (__memcg_kmem_bypass())
-   return cachep;
-   return __memcg_kmem_get_cache(cachep, gfp);
-}
-
-static __always_inline void memcg_kmem_put_cache(struct kmem_cache *cachep)
-{
-   if (memcg_kmem_enabled())
-   __memcg_kmem_put_cache(cachep);
-}
-
 /**
  * memcg_kmem_update_page_stat - update kmem page state statistics
  * @page: the page
@@ -883,15 +813,6 @@ static inline bool memcg_kmem_enabled(void)
return false;
 }
 
-static inline int memcg_kmem_charge(struct page *page, gfp_t gfp, int order)
-{
-   return 0;
-}
-
-static inline void memcg_kmem_uncharge(struct page *page, int order)
-{
-}
-
 static inline int memcg_cache_id(struct mem_cgroup *memcg)
 {
return -1;
@@ -905,16 +826,6 @@ static i

[PATCH RESEND 5/8] mm: memcontrol: teach uncharge_list to deal with kmem pages

2016-05-24 Thread Vladimir Davydov

Page table pages are batched-freed in release_pages on most
architectures. If we want to charge them to kmemcg (this is what is done
later in this series), we need to teach mem_cgroup_uncharge_list to
handle kmem pages.

Signed-off-by: Vladimir Davydov 
---
 mm/memcontrol.c | 42 --
 1 file changed, 24 insertions(+), 18 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 482b4a0c97e4..89a421ee4713 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5432,15 +5432,18 @@ void mem_cgroup_cancel_charge(struct page *page, struct 
mem_cgroup *memcg,
 
 static void uncharge_batch(struct mem_cgroup *memcg, unsigned long pgpgout,
   unsigned long nr_anon, unsigned long nr_file,
-  unsigned long nr_huge, struct page *dummy_page)
+  unsigned long nr_huge, unsigned long nr_kmem,
+  struct page *dummy_page)
 {
-   unsigned long nr_pages = nr_anon + nr_file;
+   unsigned long nr_pages = nr_anon + nr_file + nr_kmem;
unsigned long flags;
 
if (!mem_cgroup_is_root(memcg)) {
page_counter_uncharge(&memcg->memory, nr_pages);
if (do_memsw_account())
page_counter_uncharge(&memcg->memsw, nr_pages);
+   if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && nr_kmem)
+   page_counter_uncharge(&memcg->kmem, nr_kmem);
memcg_oom_recover(memcg);
}
 
@@ -5463,6 +5466,7 @@ static void uncharge_list(struct list_head *page_list)
unsigned long nr_anon = 0;
unsigned long nr_file = 0;
unsigned long nr_huge = 0;
+   unsigned long nr_kmem = 0;
unsigned long pgpgout = 0;
struct list_head *next;
struct page *page;
@@ -5473,8 +5477,6 @@ static void uncharge_list(struct list_head *page_list)
 */
next = page_list->next;
do {
-   unsigned int nr_pages = 1;
-
page = list_entry(next, struct page, lru);
next = page->lru.next;
 
@@ -5493,31 +5495,35 @@ static void uncharge_list(struct list_head *page_list)
if (memcg != page->mem_cgroup) {
if (memcg) {
uncharge_batch(memcg, pgpgout, nr_anon, nr_file,
-  nr_huge, page);
-   pgpgout = nr_anon = nr_file = nr_huge = 0;
+  nr_huge, nr_kmem, page);
+   pgpgout = nr_anon = nr_file =
+   nr_huge = nr_kmem = 0;
}
memcg = page->mem_cgroup;
}
 
-   if (PageTransHuge(page)) {
-   nr_pages <<= compound_order(page);
-   VM_BUG_ON_PAGE(!PageTransHuge(page), page);
-   nr_huge += nr_pages;
-   }
+   if (!PageKmemcg(page)) {
+   unsigned int nr_pages = 1;
 
-   if (PageAnon(page))
-   nr_anon += nr_pages;
-   else
-   nr_file += nr_pages;
+   if (PageTransHuge(page)) {
+   nr_pages <<= compound_order(page);
+   VM_BUG_ON_PAGE(!PageTransHuge(page), page);
+   nr_huge += nr_pages;
+   }
+   if (PageAnon(page))
+   nr_anon += nr_pages;
+   else
+   nr_file += nr_pages;
+   pgpgout++;
+   } else
+   nr_kmem += 1 << compound_order(page);
 
page->mem_cgroup = NULL;
-
-   pgpgout++;
} while (next != page_list);
 
if (memcg)
uncharge_batch(memcg, pgpgout, nr_anon, nr_file,
-  nr_huge, page);
+  nr_huge, nr_kmem, page);
 }
 
 /**
-- 
2.1.4

[PATCH RESEND 7/8] pipe: account to kmemcg

2016-05-24 Thread Vladimir Davydov

Pipes can consume a significant amount of system memory, hence they
should be accounted to kmemcg.

This patch marks pipe_inode_info and anonymous pipe buffer page
allocations as __GFP_ACCOUNT so that they would be charged to kmemcg.
Note, since a pipe buffer page can be "stolen" and get reused for other
purposes, including mapping to userspace, we clear PageKmemcg thus
resetting page->_mapcount and uncharge it in anon_pipe_buf_steal, which
is introduced by this patch.

Signed-off-by: Vladimir Davydov 
Cc: Alexander Viro 
---
 fs/pipe.c | 32 ++--
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 0d3f5165cb0b..4b32928f5426 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -137,6 +138,22 @@ static void anon_pipe_buf_release(struct pipe_inode_info 
*pipe,
put_page(page);
 }
 
+static int anon_pipe_buf_steal(struct pipe_inode_info *pipe,
+  struct pipe_buffer *buf)
+{
+   struct page *page = buf->page;
+
+   if (page_count(page) == 1) {
+   if (memcg_kmem_enabled()) {
+   memcg_kmem_uncharge(page, 0);
+   __ClearPageKmemcg(page);
+   }
+   __SetPageLocked(page);
+   return 0;
+   }
+   return 1;
+}
+
 /**
  * generic_pipe_buf_steal - attempt to take ownership of a &pipe_buffer
  * @pipe:  the pipe that the buffer belongs to
@@ -219,7 +236,7 @@ static const struct pipe_buf_operations anon_pipe_buf_ops = 
{
.can_merge = 1,
.confirm = generic_pipe_buf_confirm,
.release = anon_pipe_buf_release,
-   .steal = generic_pipe_buf_steal,
+   .steal = anon_pipe_buf_steal,
.get = generic_pipe_buf_get,
 };
 
@@ -227,7 +244,7 @@ static const struct pipe_buf_operations packet_pipe_buf_ops 
= {
.can_merge = 0,
.confirm = generic_pipe_buf_confirm,
.release = anon_pipe_buf_release,
-   .steal = generic_pipe_buf_steal,
+   .steal = anon_pipe_buf_steal,
.get = generic_pipe_buf_get,
 };
 
@@ -405,7 +422,7 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
int copied;
 
if (!page) {
-   page = alloc_page(GFP_HIGHUSER);
+   page = alloc_page(GFP_HIGHUSER | __GFP_ACCOUNT);
if (unlikely(!page)) {
ret = ret ? : -ENOMEM;
break;
@@ -611,7 +628,7 @@ struct pipe_inode_info *alloc_pipe_info(void)
 {
struct pipe_inode_info *pipe;
 
-   pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL);
+   pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL_ACCOUNT);
if (pipe) {
unsigned long pipe_bufs = PIPE_DEF_BUFFERS;
struct user_struct *user = get_current_user();
@@ -619,7 +636,9 @@ struct pipe_inode_info *alloc_pipe_info(void)
if (!too_many_pipe_buffers_hard(user)) {
if (too_many_pipe_buffers_soft(user))
pipe_bufs = 1;
-   pipe->bufs = kzalloc(sizeof(struct pipe_buffer) * 
pipe_bufs, GFP_KERNEL);
+   pipe->bufs = kcalloc(pipe_bufs,
+sizeof(struct pipe_buffer),
+GFP_KERNEL_ACCOUNT);
}
 
if (pipe->bufs) {
@@ -1010,7 +1029,8 @@ static long pipe_set_size(struct pipe_inode_info *pipe, 
unsigned long nr_pages)
if (nr_pages < pipe->nrbufs)
return -EBUSY;
 
-   bufs = kcalloc(nr_pages, sizeof(*bufs), GFP_KERNEL | __GFP_NOWARN);
+   bufs = kcalloc(nr_pages, sizeof(*bufs),
+  GFP_KERNEL_ACCOUNT | __GFP_NOWARN);
if (unlikely(!bufs))
return -ENOMEM;
 
-- 
2.1.4

[PATCH] driver: input :touchscreen : add Raydium I2C touch driver

2016-05-24 Thread jeffrey.lin

Raydium I2C touch driver.

Signed-off-by: jeffrey.lin 
---
 drivers/input/touchscreen/Kconfig  |   12 +
 drivers/input/touchscreen/Makefile |1 +
 drivers/input/touchscreen/raydium_i2c_ts.c | 1208 
 3 files changed, 1221 insertions(+)
 create mode 100644 drivers/input/touchscreen/raydium_i2c_ts.c

diff --git a/drivers/input/touchscreen/Kconfig 
b/drivers/input/touchscreen/Kconfig
index 3f3f6ee..df0e2ed 100644
--- a/drivers/input/touchscreen/Kconfig
+++ b/drivers/input/touchscreen/Kconfig
@@ -915,6 +915,18 @@ config TOUCHSCREEN_PCAP
  To compile this driver as a module, choose M here: the
  module will be called pcap_ts.
 
+config TOUCHSCREEN_RM_TS
+   tristate "Raydium I2C Touchscreen"
+   depends on I2C
+   help
+ Say Y here if you have Raydium series I2C touchscreen,
+ such as RM32380,connected to your system.
+
+ If unsure, say N.
+
+ To compile this driver as a module, choose M here: the
+ module will be called raydium_i2c_ts.
+
 config TOUCHSCREEN_ST1232
tristate "Sitronix ST1232 touchscreen controllers"
depends on I2C
diff --git a/drivers/input/touchscreen/Makefile 
b/drivers/input/touchscreen/Makefile
index 4941f2d..99e08cf 100644
--- a/drivers/input/touchscreen/Makefile
+++ b/drivers/input/touchscreen/Makefile
@@ -55,6 +55,7 @@ obj-$(CONFIG_TOUCHSCREEN_USB_COMPOSITE)   += 
usbtouchscreen.o
 obj-$(CONFIG_TOUCHSCREEN_PCAP) += pcap_ts.o
 obj-$(CONFIG_TOUCHSCREEN_PENMOUNT) += penmount.o
 obj-$(CONFIG_TOUCHSCREEN_PIXCIR)   += pixcir_i2c_ts.o
+obj-$(CONFIG_TOUCHSCREEN_RM_TS)+= raydium_i2c_ts.o
 obj-$(CONFIG_TOUCHSCREEN_S3C2410)  += s3c2410_ts.o
 obj-$(CONFIG_TOUCHSCREEN_ST1232)   += st1232.o
 obj-$(CONFIG_TOUCHSCREEN_STMPE)+= stmpe-ts.o
diff --git a/drivers/input/touchscreen/raydium_i2c_ts.c 
b/drivers/input/touchscreen/raydium_i2c_ts.c
new file mode 100644
index 000..27c84c5
--- /dev/null
+++ b/drivers/input/touchscreen/raydium_i2c_ts.c
@@ -0,0 +1,1208 @@
+/*
+ * Raydium touchscreen I2C driver.
+ *
+ * Copyright (C) 2012-2014, Raydium Semiconductor Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2, and only version 2, as published by the
+ * Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Raydium reserves the right to make changes without further notice
+ * to the materials described herein. Raydium does not assume any
+ * liability arising out of the application described herein.
+ *
+ * Contact Raydium Semiconductor Corporation at www.rad-ic.com
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+/* Slave I2C mode */
+#define RM_BOOT_BLDR   0x02
+#define RM_BOOT_MAIN   0x03
+
+/* I2C bootoloader commands */
+#define RM_CMD_BOOT_PAGE_WRT   0x0B/* send bl page write */
+#define RM_CMD_BOOT_WRT0x11/* send bl write */
+#define RM_CMD_BOOT_ACK0x22/* send ack*/
+#define RM_CMD_BOOT_CHK0x33/* send data check */
+#define RM_CMD_BOOT_READ   0x44/* send wait bl data ready*/
+
+#define RM_BOOT_RDY0xFF/* bl data ready */
+
+/* I2C main commands */
+#define RM_CMD_QUERY_BANK  0x2B
+#define RM_CMD_DATA_BANK   0x4D
+#define RM_CMD_ENTER_SLEEP 0x4E
+#define RM_CMD_BANK_SWITCH 0xAA
+
+#define RM_RESET_MSG_ADDR  0x4004
+
+#define RM_MAX_READ_SIZE   56
+
+/* Touch relative info */
+#define RM_MAX_RETRIES 3
+#define RM_MAX_TOUCH_NUM   10
+#define RM_BOOT_DELAY_MS   100
+
+/* Offsets in contact data */
+#define RM_CONTACT_STATE_POS   0
+#define RM_CONTACT_X_POS   1
+#define RM_CONTACT_Y_POS   3
+#define RM_CONTACT_PRESSURE_POS5   /*FIXME, correct 5*/
+#define RM_CONTACT_WIDTH_X_POS 6
+#define RM_CONTACT_WIDTH_Y_POS 7
+
+/* Bootloader relative info */
+#define RM_BL_WRT_CMD_SIZE 3   /* bl flash wrt cmd size */
+#define RM_BL_WRT_PKG_SIZE 32  /* bl wrt pkg size */
+#define RM_BL_WRT_LEN  (RM_BL_WRT_PKG_SIZE + RM_BL_WRT_CMD_SIZE)
+#define RM_FW_PAGE_SIZE128
+#define RM_MAX_FW_RETRIES  30
+#define RM_MAX_FW_SIZE (0xD000)/*define max firmware size*/
+
+#define RM_POWERON_DELAY_USEC  500
+#define RM_RESET_DELAY_MSEC50
+
+enum raydium_bl_cmd {
+   BL_HEADER = 0,
+   BL_PAGE_STR,
+   BL_PKG_IDX,
+   BL_DATA_STR,
+};
+
+enum raydium_bl_ack {
+   RAYDIUM_ACK_NULL = 0,
+   RAYDIUM_WAIT_READY,
+   R

[PATCH RESEND 4/8] mm: charge/uncharge kmemcg from generic page allocator paths

2016-05-24 Thread Vladimir Davydov

Currently, to charge a non-slab allocation to kmemcg one has to use
alloc_kmem_pages helper with __GFP_ACCOUNT flag. A page allocated with
this helper should finally be freed using free_kmem_pages, otherwise it
won't be uncharged.

This API suits its current users fine, but it turns out to be impossible
to use along with page reference counting, i.e. when an allocation is
supposed to be freed with put_page, as it is the case with pipe or unix
socket buffers.

To overcome this limitation, this patch moves charging/uncharging to
generic page allocator paths, i.e. to __alloc_pages_nodemask and
free_pages_prepare, and zaps alloc/free_kmem_pages helpers. This way,
one can use any of the available page allocation functions to get the
allocated page charged to kmemcg - it's enough to pass __GFP_ACCOUNT,
just like in case of kmalloc and friends. A charged page will be
automatically uncharged on free.

To make it possible, we need to mark pages charged to kmemcg somehow. To
avoid introducing a new page flag, we make use of page->_mapcount for
marking such pages. Since pages charged to kmemcg are not supposed to be
mapped to userspace, it should work just fine. There are other (ab)users
of page->_mapcount - buddy and balloon pages - but we don't conflict
with them.

In case kmemcg is compiled out or not used at runtime, this patch
introduces no overhead to generic page allocator paths. If kmemcg is
used, it will be plus one gfp flags check on alloc and plus one
page->_mapcount check on free, which shouldn't hurt performance, because
the data accessed are hot.

Signed-off-by: Vladimir Davydov 
---
 include/linux/gfp.h| 10 +--
 include/linux/page-flags.h |  7 +
 kernel/fork.c  |  6 ++---
 mm/page_alloc.c| 66 +-
 mm/slab_common.c   |  2 +-
 mm/slub.c  |  6 ++---
 mm/vmalloc.c   |  6 ++---
 7 files changed, 31 insertions(+), 72 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 570383a41853..c29e9d347bc6 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -78,8 +78,7 @@ struct vm_area_struct;
  * __GFP_THISNODE forces the allocation to be satisified from the requested
  *   node with no fallbacks or placement policy enforcements.
  *
- * __GFP_ACCOUNT causes the allocation to be accounted to kmemcg (only relevant
- *   to kmem allocations).
+ * __GFP_ACCOUNT causes the allocation to be accounted to kmemcg.
  */
 #define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE)
 #define __GFP_WRITE((__force gfp_t)___GFP_WRITE)
@@ -486,10 +485,6 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int 
order,
 #define alloc_page_vma_node(gfp_mask, vma, addr, node) \
alloc_pages_vma(gfp_mask, 0, vma, addr, node, false)
 
-extern struct page *alloc_kmem_pages(gfp_t gfp_mask, unsigned int order);
-extern struct page *alloc_kmem_pages_node(int nid, gfp_t gfp_mask,
- unsigned int order);
-
 extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order);
 extern unsigned long get_zeroed_page(gfp_t gfp_mask);
 
@@ -513,9 +508,6 @@ extern void *__alloc_page_frag(struct page_frag_cache *nc,
   unsigned int fragsz, gfp_t gfp_mask);
 extern void __free_page_frag(void *addr);
 
-extern void __free_kmem_pages(struct page *page, unsigned int order);
-extern void free_kmem_pages(unsigned long addr, unsigned int order);
-
 #define __free_page(page) __free_pages((page), 0)
 #define free_page(addr) free_pages((addr), 0)
 
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 9940ade6a25e..b51e75a47e82 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -630,6 +630,13 @@ PAGE_MAPCOUNT_OPS(Buddy, BUDDY)
 #define PAGE_BALLOON_MAPCOUNT_VALUE(-256)
 PAGE_MAPCOUNT_OPS(Balloon, BALLOON)
 
+/*
+ * If kmemcg is enabled, the buddy allocator will set PageKmemcg() on
+ * pages allocated with __GFP_ACCOUNT. It gets cleared on page free.
+ */
+#define PAGE_KMEMCG_MAPCOUNT_VALUE (-512)
+PAGE_MAPCOUNT_OPS(Kmemcg, KMEMCG)
+
 extern bool is_free_buddy_page(struct page *page);
 
 /*
diff --git a/kernel/fork.c b/kernel/fork.c
index 66cc2e0e137e..3f3c30f80786 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -162,8 +162,8 @@ void __weak arch_release_thread_info(struct thread_info *ti)
 static struct thread_info *alloc_thread_info_node(struct task_struct *tsk,
  int node)
 {
-   struct page *page = alloc_kmem_pages_node(node, THREADINFO_GFP,
- THREAD_SIZE_ORDER);
+   struct page *page = alloc_pages_node(node, THREADINFO_GFP,
+THREAD_SIZE_ORDER);
 
if (page)
memcg_kmem_update_page_stat(page, MEMCG_KERNEL_STACK,
@@ -178,7 +178,7 @@ static inline void free_thread_info(struct thread

Re: Bisection: Lost wakeups from b5179ac70de8

2016-05-24 Thread Peter Zijlstra

On Mon, May 23, 2016 at 07:04:10AM -0700, Paul E. McKenney wrote:
> Hello, Peter,
> 
> Current mainline doesn't do well with RCU torture testing, and the
> symptom once again looks like lost wakeups.  Thankfully, this time each
> run takes only about an hour, and the false-positive/-negative rate
> is negligible.  This means that for the first time ever, "git bisect"
> actually did something useful for me.  The first bad commit is:
> 
> b5179ac70de8 ("sched/fair: Prepare to fix fairness problems on migration").
> 

Yeah, we have a patch for that.. I'll go writes a Changelog for it and
put it to sched/urgent.

See lkml.kernel.org/r/20160523091907.gd15...@worktop.ger.corp.intel.com

Re: [RFC PATCHv2] usb: USB Type-C Connector Class

2016-05-24 Thread Oliver Neukum

On Mon, 2016-05-23 at 10:09 -0700, Guenter Roeck wrote:
> On Mon, May 23, 2016 at 01:25:19PM +0200, Oliver Neukum wrote:
> > On Mon, 2016-05-23 at 12:57 +0300, Heikki Krogerus wrote:
> >
> > A reset is a generic function, so it does not belong to specific
> > drivers.
> > 
> A would expect the driver to execute the reset.
> 
> Maybe the question should be phrased differently: Even USCI (which
> doesn't provide for everything) has commands to reset the policy
> manager and to reset the connector. The class should provide a means
> to execute those commands.

Yes.

> > So for Alternate Modes we need on a high level the following features
> > 
> > 1. discovery of available Alternate Modes
> > 2. selection of an Alternate Mode
> > 3. notification about entering an Alternate Mode
> > 4. triggering a reset
> > 5. notification about resets
> > 
> > 6. discovery about the current role
> > 7. switching roles
> > 8. setting preferred roles (Try.SRC and Try.SNK)
> > 
> 
> Isn't reset and role handling orthogonal to alternate mode functionality ?
> Both will still be needed even if alternate mode support is not implemented
> at all.

In part. A reset can cause the Alternate Mode to be left unexpectedly
and unintentionally.
So how many APIs do we want?
Three:

- Alternate Modes
- USB PD
- type C for roles and reset

Or another number?

> > I like your API as it is now. But it is incomplete.
> > 
> 
> Same here.

So what is to be done?

Regards
Oliver

Re: [git pull] check headers fix.

2016-05-24 Thread Christoph Hellwig

On Tue, May 24, 2016 at 07:29:25AM +0100, Dave Airlie wrote:
> 
> Hi Linus,
> 
> here is the C++ guards warning fix from Arnd.

So why the hell do we have C++ guards in kernel headers?

Re: [PATCH 06/16] sched: Disable WAKE_AFFINE for asymmetric configurations

2016-05-24 Thread Vincent Guittot

On 23 May 2016 at 12:58, Morten Rasmussen  wrote:
> If the system has cpu of different compute capacities (e.g. big.LITTLE)
> let affine wakeups be constrained to cpus of the same type.

Can you explain why you don't want wake affine with cpus with
different compute capacity ?

>
> cc: Ingo Molnar 
> cc: Peter Zijlstra 
>
> Signed-off-by: Morten Rasmussen 
> ---
>  kernel/sched/core.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index d9619a3..558ec4a 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6410,6 +6410,9 @@ sd_init(struct sched_domain_topology_level *tl, int cpu)
> sd->idle_idx = 1;
> }
>
> +   if (sd->flags & SD_ASYM_CPUCAPACITY)
> +   sd->flags &= ~SD_WAKE_AFFINE;
> +
> sd->private = &tl->data;
>
> return sd;
> --
> 1.9.1
>

[PATCH] brcmfmac: fix setting AP channel with new firmwares

2016-05-24 Thread Rafał Miłecki

Firmware for new chipsets is based on a new major version of code
internally maintained at Broadcom. E.g. brcmfmac4366b-pcie.bin (used for
BCM4366B1) is based on 10.10.69.3309 while brcmfmac43602-pcie.ap.bin was
based on 7.35.177.56.

Currently setting AP 5 GHz channel doesn't work reliably with BCM4366B1.
When setting e.g. 36 control channel with VHT80 (center channel 42)
firmware may randomly pick one of:
1) 52 control channel with 58 as center one
2) 100 control channel with 106 as center one
3) 116 control channel with 122 as center one
4) 149 control channel with 155 as center one

It seems new firmwares require setting AP mode (BRCMF_C_SET_AP) before
specifying a channel. Changing an order of firmware calls fixes the
problem.

This fix was verified with BCM4366B1 and tested for regressions on
BCM43602. It's unclear if it's needed (or correct at all) for P2P
interfaces so it leaves this code unaffected.

Signed-off-by: Rafał Miłecki 
---
 .../net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c  | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
index 299a404..3d09d23 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
@@ -4423,7 +4423,7 @@ brcmf_cfg80211_start_ap(struct wiphy *wiphy, struct 
net_device *ndev,
struct brcmf_join_params join_params;
enum nl80211_iftype dev_role;
struct brcmf_fil_bss_enable_le bss_enable;
-   u16 chanspec;
+   u16 chanspec = chandef_to_chanspec(&cfg->d11inf, &settings->chandef);
bool mbss;
int is_11d;
 
@@ -4499,16 +4499,16 @@ brcmf_cfg80211_start_ap(struct wiphy *wiphy, struct 
net_device *ndev,
 
brcmf_config_ap_mgmt_ie(ifp->vif, &settings->beacon);
 
-   if (!mbss) {
-   chanspec = chandef_to_chanspec(&cfg->d11inf,
-  &settings->chandef);
+   if (dev_role == NL80211_IFTYPE_P2P_GO) {
err = brcmf_fil_iovar_int_set(ifp, "chanspec", chanspec);
if (err < 0) {
brcmf_err("Set Channel failed: chspec=%d, %d\n",
  chanspec, err);
goto exit;
}
+   }
 
+   if (!mbss) {
if (is_11d != ifp->vif->is_11d) {
err = brcmf_fil_cmd_int_set(ifp, BRCMF_C_SET_REGULATORY,
is_11d);
@@ -4565,6 +4565,14 @@ brcmf_cfg80211_start_ap(struct wiphy *wiphy, struct 
net_device *ndev,
brcmf_err("setting AP mode failed %d\n", err);
goto exit;
}
+   if (!mbss) {
+   err = brcmf_fil_iovar_int_set(ifp, "chanspec", 
chanspec);
+   if (err < 0) {
+   brcmf_err("Set Channel failed: chspec=%d, %d\n",
+ chanspec, err);
+   goto exit;
+   }
+   }
err = brcmf_fil_cmd_int_set(ifp, BRCMF_C_UP, 1);
if (err < 0) {
brcmf_err("BRCMF_C_UP error (%d)\n", err);
-- 
1.8.4.5

Re: [PATCH 0/4] dma-mapping: Constify dma_attrs

2016-05-24 Thread Christoph Hellwig

I think this is moving into the wrong direction.  The right fix here
is to get of all the dma_attrs boilerplate code and just replace it
with a simple enum dma_flags.  This would simplify both the callers
and most importantly the wrappers for the flag-less versions a lot.

Re: [PATCH] NVMe: Only release requested regions

2016-05-24 Thread Johannes Thumshirn

On Tue, May 10, 2016 at 03:14:28PM +0200, Johannes Thumshirn wrote:
> The NVMe driver only requests the PCIe device's memory regions but releases
> all possible regions (including eventual I/O regions). This leads to a stale
> warning entry in dmesg about freeing non existent resources.
> 
> Signed-off-by: Johannes Thumshirn 
> ---
>  drivers/nvme/host/pci.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index eec73fe..6f5ad07 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1759,9 +1759,14 @@ static int nvme_pci_enable(struct nvme_dev *dev)
>  
>  static void nvme_dev_unmap(struct nvme_dev *dev)
>  {
> + struct pci_dev *pdev = to_pci_dev(dev->dev);
> + int bars;
> +
>   if (dev->bar)
>   iounmap(dev->bar);
> - pci_release_regions(to_pci_dev(dev->dev));
> +
> + bars = pci_select_bars(pdev, IORESOURCE_MEM);
> + pci_release_selected_regions(pdev, bars);
>  }
>  
>  static void nvme_pci_disable(struct nvme_dev *dev)
> @@ -1998,7 +2003,7 @@ static int nvme_dev_map(struct nvme_dev *dev)
>  
> return 0;
>release:
> -   pci_release_regions(pdev);
> +   pci_release_selected_regions(pdev, bars);
> return -ENODEV;
>  }
>  
> -- 
> 1.8.5.6
> 

Keith, Jens, any opinions?

As I've probably missed v4.7, is it possible to get it for v4.8?
Or should I take on the PCI helper functions Christoph suggested first?

Johannes

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Re: [PATCHv3] support for AD5820 camera auto-focus coil

2016-05-24 Thread Ivaylo Dimitrov




On 24.05.2016 12:04, Pavel Machek wrote:

Hi!


+static int ad5820_registered(struct v4l2_subdev *subdev)
+{
+   struct ad5820_device *coil = to_ad5820_device(subdev);
+   struct i2c_client *client = v4l2_get_subdevdata(subdev);
+
+   coil->vana = regulator_get(&client->dev, "VANA");


devm_regulator_get()?


I'd rather avoid devm_ here. Driver is simple enough to allow it.



Now thinking about it, what would happen here if regulator_get() returns 
-EPROBE_DEFER? Wouldn't it be better to move regulator_get to the 
probe() function, something like:


static int ad5820_probe(struct i2c_client *client,
const struct i2c_device_id *devid)
{
struct ad5820_device *coil;
int ret = 0;

coil = devm_kzalloc(sizeof(*coil), GFP_KERNEL);
if (coil == NULL)
return -ENOMEM;

coil->vana = devm_regulator_get(&client->dev, NULL);
if (IS_ERR(coil->vana)) {
ret = PTR_ERR(coil->vana);
if (ret != -EPROBE_DEFER)
dev_err(&client->dev, "could not get regulator for 
vana\n");
return ret;
}

mutex_init(&coil->power_lock);
...

with the appropriate changes to remove() because of the devm API usage.


+#define AD5820_RAMP_MODE_LINEAR(0 << 3)
+#define AD5820_RAMP_MODE_64_16 (1 << 3)
+
+struct ad5820_platform_data {
+   int (*set_xshutdown)(struct v4l2_subdev *subdev, int set);
+};
+
+#define to_ad5820_device(sd)   container_of(sd, struct ad5820_device, subdev)
+
+struct ad5820_device {
+   struct v4l2_subdev subdev;
+   struct ad5820_platform_data *platform_data;
+   struct regulator *vana;
+
+   struct v4l2_ctrl_handler ctrls;
+   u32 focus_absolute;
+   u32 focus_ramp_time;
+   u32 focus_ramp_mode;
+
+   struct mutex power_lock;
+   int power_count;
+
+   int standby : 1;
+};
+


The same for struct ad5820_device, is it really part of the public API?


Let me check what can be done with it.
Pavel

Re: [PATCH v7 4/4] CMDQ: suspend/resume protection

2016-05-24 Thread CK Hu

On Mon, 2016-05-23 at 20:23 +0800, HS Liao wrote:
> Add suspend/resume protection mechanism to prevent active task(s) in
> suspend.
> 
> Signed-off-by: HS Liao 
> ---
>  drivers/soc/mediatek/mtk-cmdq.c | 174 
> ++--
>  1 file changed, 166 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/soc/mediatek/mtk-cmdq.c b/drivers/soc/mediatek/mtk-cmdq.c
> index f8c5d02..1a51cfb 100644
> --- a/drivers/soc/mediatek/mtk-cmdq.c
> +++ b/drivers/soc/mediatek/mtk-cmdq.c
> @@ -39,6 +39,7 @@
>  #define CMDQ_CLK_NAME"gce"
>  
>  #define CMDQ_CURR_IRQ_STATUS 0x010
> +#define CMDQ_CURR_LOADED_THR 0x018
>  #define CMDQ_THR_SLOT_CYCLES 0x030
>  
>  #define CMDQ_THR_BASE0x100
> @@ -125,6 +126,7 @@ enum cmdq_code {
>  
>  enum cmdq_task_state {
>   TASK_STATE_BUSY,/* task running on a thread */
> + TASK_STATE_KILLED,  /* task process being killed */
>   TASK_STATE_ERROR,   /* task execution error */
>   TASK_STATE_DONE,/* task finished */
>  };
> @@ -161,8 +163,12 @@ struct cmdq {
>   u32 irq;
>   struct workqueue_struct *task_release_wq;
>   struct cmdq_thread  thread[CMDQ_THR_MAX_COUNT];
> - spinlock_t  exec_lock;  /* for exec task */
> + atomic_tthread_usage;
> + struct mutextask_mutex; /* for task */
> + spinlock_t  exec_lock;  /* for exec */
>   struct clk  *clock;
> + boolsuspending;
> + boolsuspended;
>  };
>  
>  struct cmdq_subsys {
> @@ -196,15 +202,27 @@ static int cmdq_eng_get_thread(u64 flag)
>   return CMDQ_THR_DISP_MISC_IDX;
>  }
>  
> -static void cmdq_task_release(struct cmdq_task *task)
> +static void cmdq_task_release_unlocked(struct cmdq_task *task)
>  {
>   struct cmdq *cmdq = task->cmdq;
>  
> + /* This func should be inside cmdq->task_mutex mutex */
> + lockdep_assert_held(&cmdq->task_mutex);
> +
>   dma_free_coherent(cmdq->dev, task->command_size, task->va_base,
> task->mva_base);
>   kfree(task);
>  }
>  
> +static void cmdq_task_release(struct cmdq_task *task)
> +{
> + struct cmdq *cmdq = task->cmdq;
> +
> + mutex_lock(&cmdq->task_mutex);
> + cmdq_task_release_unlocked(task);
> + mutex_unlock(&cmdq->task_mutex);
> +}
> +
>  static struct cmdq_task *cmdq_task_acquire(struct cmdq_rec *rec,
>  struct cmdq_task_cb cb)
>  {
> @@ -576,6 +594,12 @@ static int cmdq_task_wait_and_release(struct cmdq_task 
> *task)
>   dev_dbg(dev, "timeout!\n");
>  
>   spin_lock_irqsave(&cmdq->exec_lock, flags);
> +
> + if (cmdq->suspending && task->task_state == TASK_STATE_KILLED) {
> + spin_unlock_irqrestore(&cmdq->exec_lock, flags);
> + return 0;
> + }
> +
>   if (task->task_state != TASK_STATE_DONE)
>   err = cmdq_task_handle_error_result(task);
>   if (list_empty(&thread->task_busy_list))
> @@ -584,7 +608,9 @@ static int cmdq_task_wait_and_release(struct cmdq_task 
> *task)
>  
>   /* release regardless of success or not */
>   clk_disable_unprepare(cmdq->clock);
> - cmdq_task_release(task);
> + atomic_dec(&cmdq->thread_usage);
> + if (!(task->cmdq->suspending && task->task_state == TASK_STATE_KILLED))
> + cmdq_task_release(task);
>  
>   return err;
>  }
> @@ -597,12 +623,28 @@ static void cmdq_task_wait_release_work(struct 
> work_struct *work_item)
>   cmdq_task_wait_and_release(task);
>  }
>  
> -static void cmdq_task_wait_release_schedule(struct cmdq_task *task)
> +static void cmdq_task_wait_release_schedule(struct cmdq *cmdq,
> + struct cmdq_task *task)
>  {
> - struct cmdq *cmdq = task->cmdq;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&cmdq->exec_lock, flags);
> +
> + if (cmdq->suspending || cmdq->suspended) {
> + /*
> +  * This means system is suspened between
> +  * cmdq_task_submit_async() and
> +  * cmdq_task_wait_release_schedule(), so return immediately.
> +  * This task should be forced to remove by suspend flow.
> +  */
> + spin_unlock_irqrestore(&cmdq->exec_lock, flags);
> + return;
> + }
>  
>   INIT_WORK(&task->release_work, cmdq_task_wait_release_work);
>   queue_work(cmdq->task_release_wq, &task->release_work);
> +
> + spin_unlock_irqrestore(&cmdq->exec_lock, flags);
>  }
>  
>  static int cmdq_rec_realloc_cmd_buffer(struct cmdq_rec *rec, size_t size)
> @@ -766,18 +808,31 @@ static int _cmdq_rec_flush(struct cmdq_rec *rec, struct 
> cmdq_task **task_out,
>   struct cmdq_thread *thread;
>   int err;
>  
> + mutex_lock(&cmdq->task_mutex);
> + if (rec->cmdq->suspending || rec-

Re: [PATCH] mm: memcontrol: fix possible css ref leak on oom

2016-05-24 Thread Vladimir Davydov

On Tue, May 24, 2016 at 10:47:37AM +0200, Michal Hocko wrote:
> On Tue 24-05-16 11:43:19, Vladimir Davydov wrote:
> > On Mon, May 23, 2016 at 07:44:43PM +0200, Michal Hocko wrote:
> > > On Mon 23-05-16 19:02:10, Vladimir Davydov wrote:
> > > > mem_cgroup_oom may be invoked multiple times while a process is handling
> > > > a page fault, in which case current->memcg_in_oom will be overwritten
> > > > leaking the previously taken css reference.
> > > 
> > > Have you seen this happening? I was under impression that the page fault
> > > paths that have oom enabled will not retry allocations.
> > 
> > filemap_fault will, for readahead.
> 
> I thought that the readahead is __GFP_NORETRY so we do not trigger OOM
> killer.

Hmm, interesting. We do allocate readahead pages with __GFP_NORETRY, but
we add them to page cache and hence charge with GFP_KERNEL or GFP_NOFS
mask, see __do_page_cache_readahaed -> read_pages.

Re: [PATCH] mm: memcontrol: fix possible css ref leak on oom

2016-05-24 Thread Vladimir Davydov

On Mon, May 23, 2016 at 07:44:43PM +0200, Michal Hocko wrote:
> On Mon 23-05-16 19:02:10, Vladimir Davydov wrote:
> > mem_cgroup_oom may be invoked multiple times while a process is handling
> > a page fault, in which case current->memcg_in_oom will be overwritten
> > leaking the previously taken css reference.
> 
> Have you seen this happening? I was under impression that the page fault
> paths that have oom enabled will not retry allocations.

filemap_fault will, for readahead.

This is rather unlikely, just like the whole oom scenario, so I haven't
faced this leak in production yet, although it's pretty easy to
reproduce using a contrived test. However, even if this leak happened on
my host, I would probably not notice, because currently we have no clear
means of catching css leaks. I'm thinking about adding a file to debugfs
containing brief information about all memory cgroups, including dead
ones, so that we could at least see how many dead memory cgroups are
dangling out there.

>  
> > Signed-off-by: Vladimir Davydov 
> 
> That being said I do not have anything against the patch. It is a good
> safety net I am just not sure this might happen right now and so the
> patch is not stable candidate.
> 
> After clarification
> Acked-by: Michal Hocko 

Thanks.

[PATCHv4] support for AD5820 camera auto-focus coil

2016-05-24 Thread Pavel Machek

This adds support for AD5820 autofocus coil, found for example in
Nokia N900 smartphone.

Signed-off-by: Pavel Machek 

---
v2: simple cleanups, fix error paths, simplify probe
v3: more cleanups, remove printk, add include
v4: remove header file.

diff --git a/drivers/media/i2c/Kconfig b/drivers/media/i2c/Kconfig
index 993dc50..77313a1 100644
--- a/drivers/media/i2c/Kconfig
+++ b/drivers/media/i2c/Kconfig
@@ -279,6 +279,13 @@ config VIDEO_ML86V7667
  To compile this driver as a module, choose M here: the
  module will be called ml86v7667.
 
+config VIDEO_AD5820
+   tristate "AD5820 lens voice coil support"
+   depends on I2C && VIDEO_V4L2 && MEDIA_CONTROLLER
+   ---help---
+ This is a driver for the AD5820 camera lens voice coil.
+ It is used for example in Nokia N900 (RX-51).
+
 config VIDEO_SAA7110
tristate "Philips SAA7110 video decoder"
depends on VIDEO_V4L2 && I2C
diff --git a/drivers/media/i2c/Makefile b/drivers/media/i2c/Makefile
index 94f2c99..34434ae 100644
--- a/drivers/media/i2c/Makefile
+++ b/drivers/media/i2c/Makefile
@@ -19,6 +20,7 @@ obj-$(CONFIG_VIDEO_SAA717X) += saa717x.o
 obj-$(CONFIG_VIDEO_SAA7127) += saa7127.o
 obj-$(CONFIG_VIDEO_SAA7185) += saa7185.o
 obj-$(CONFIG_VIDEO_SAA6752HS) += saa6752hs.o
+obj-$(CONFIG_VIDEO_AD5820)  += ad5820.o
 obj-$(CONFIG_VIDEO_ADV7170) += adv7170.o
 obj-$(CONFIG_VIDEO_ADV7175) += adv7175.o
 obj-$(CONFIG_VIDEO_ADV7180) += adv7180.o
diff --git a/drivers/media/i2c/ad5820.c b/drivers/media/i2c/ad5820.c
new file mode 100644
index 000..f956bd3
--- /dev/null
+++ b/drivers/media/i2c/ad5820.c
@@ -0,0 +1,438 @@
+/*
+ * drivers/media/i2c/ad5820.c
+ *
+ * AD5820 DAC driver for camera voice coil focus.
+ *
+ * Copyright (C) 2008 Nokia Corporation
+ * Copyright (C) 2007 Texas Instruments
+ * Copyright (C) 2016 Pavel Machek 
+ *
+ * Contact: Tuukka Toivonen
+ *  Sakari Ailus
+ *
+ * Based on af_d88.c by Texas Instruments.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#define AD5820_NAME"ad5820"
+
+/* Register definitions */
+#define AD5820_POWER_DOWN  (1 << 15)
+#define AD5820_DAC_SHIFT   4
+#define AD5820_RAMP_MODE_LINEAR(0 << 3)
+#define AD5820_RAMP_MODE_64_16 (1 << 3)
+
+#define CODE_TO_RAMP_US(s) ((s) == 0 ? 0 : (1 << ((s) - 1)) * 50)
+#define RAMP_US_TO_CODE(c) fls(((c) + ((c)>>1)) / 50)
+
+#define to_ad5820_device(sd)   container_of(sd, struct ad5820_device, subdev)
+
+struct ad5820_device {
+   struct v4l2_subdev subdev;
+   struct ad5820_platform_data *platform_data;
+   struct regulator *vana;
+
+   struct v4l2_ctrl_handler ctrls;
+   u32 focus_absolute;
+   u32 focus_ramp_time;
+   u32 focus_ramp_mode;
+
+   struct mutex power_lock;
+   int power_count;
+
+   int standby : 1;
+};
+
+/**
+ * @brief I2C write using i2c_transfer().
+ * @param coil - the driver data structure
+ * @param data - register value to be written
+ * @returns nonnegative on success, negative if failed
+ */
+static int ad5820_write(struct ad5820_device *coil, u16 data)
+{
+   struct i2c_client *client = v4l2_get_subdevdata(&coil->subdev);
+   struct i2c_msg msg;
+   int r;
+
+   if (!client->adapter)
+   return -ENODEV;
+
+   data = cpu_to_be16(data);
+   msg.addr  = client->addr;
+   msg.flags = 0;
+   msg.len   = 2;
+   msg.buf   = (u8 *)&data;
+
+   r = i2c_transfer(client->adapter, &msg, 1);
+   if (r < 0) {
+   dev_err(&client->dev, "write failed, error %d\n", r);
+   return r;
+   }
+
+   return 0;
+}
+
+/*
+ * Calculate status word and write it to the device based on current
+ * values of V4L2 controls. It is assumed that the stored V4L2 control
+ * values are properly limited and rounded.
+ */
+static int ad5820_update_hw(struct ad5820_device *coil)
+{
+   u16 status;
+
+   status = RAMP_US_TO_CODE(coil->focus_ramp_time);
+   status |= coil->focus_ramp_mode
+   ? AD5820_RAMP_MODE_64_16 : AD5820_RAMP_MODE_LINEAR;
+   status |= coil->focus_absolute << AD5820_DAC_SHIFT;
+
+   if (coil->standby)
+   status |= AD5820_POWER_DOWN;
+
+   return ad5820_write(coil, status);
+}
+
+/*
+ * Power handling
+ */
+static int ad5820_power_off(struct ad5820_device *coil, int standby)
+{
+   int ret = 0;
+
+   /*
+* Go to standby first as real power off my be denied by the hardware
+

Re: [PATCH] NVMe: Only release requested regions

2016-05-24 Thread Christoph Hellwig

On Tue, May 24, 2016 at 11:15:52AM +0200, Johannes Thumshirn wrote:
> As I've probably missed v4.7, is it possible to get it for v4.8?
> Or should I take on the PCI helper functions Christoph suggested first?

Let's get the quick fix in first, and I think it's still 4.7 material.

Re: [PATCH] hwrng: stm32 - fix build warning

2016-05-24 Thread Maxime Coquelin

2016-05-24 10:58 GMT+02:00 Arnd Bergmann :
> On Tuesday, May 24, 2016 10:50:17 AM CEST Maxime Coquelin wrote:
>> diff --git a/drivers/char/hw_random/stm32-rng.c
>> b/drivers/char/hw_random/stm32-rng.c
>> index 92a810648bd0..2a0fc90e4dc3 100644
>> --- a/drivers/char/hw_random/stm32-rng.c
>> +++ b/drivers/char/hw_random/stm32-rng.c
>> @@ -68,6 +68,10 @@ static int stm32_rng_read(struct hwrng *rng, void
>> *data, size_t max, bool wait)
>> } while (!sr && --timeout);
>> }
>>
>> +   if (WARN_ONCE(sr & (RNG_SR_SEIS | RNG_SR_CEIS),
>> +   "bad RNG status - %x\n", sr))
>> +   writel_relaxed(0, priv->base + RNG_SR);
>> +
>> /* If error detected or data not ready... */
>> if (sr != RNG_SR_DRDY)
>> break;
>> @@ -79,10 +83,6 @@ static int stm32_rng_read(struct hwrng *rng, void
>> *data, size_t max, bool wait)
>> max -= sizeof(u32);
>> }
>>
>> -   if (WARN_ONCE(sr & (RNG_SR_SEIS | RNG_SR_CEIS),
>> - "bad RNG status - %x\n", sr))
>> -   writel_relaxed(0, priv->base + RNG_SR);
>> -
>> pm_runtime_mark_last_busy((struct device *) priv->rng.priv);
>> pm_runtime_put_sync_autosuspend((struct device *) priv->rng.priv);
>>
>> Thanks,
>>
>
> Yes, that looks good to me.

Thanks!
Sudip, do you want to send the patch, or I manage to do it?

Maxime

[PATCH v5 3/5] perf callchain: Add support for cross-platform unwind

2016-05-24 Thread He Kuang

Use thread specific unwind ops to unwind cross-platform callchains.

Currently, unwind methods is suitable for local unwind, this patch
changes the fixed methods to thread/map related. Each time a map is
inserted, we find the target arch and see if this platform can be
remote unwind. We test for x86 platform and only show proper
messages. The real unwind methods are not implemented, will be
introduced in next patch.

CONFIG_LIBUNWIND/NO_LIBUNWIND are changed to
CONFIG_LOCAL_LIBUNWIND/NO_LOCAL_LIBUNWIND for retaining local unwind
features. CONFIG_LIBUNWIND stands for either local or remote or both
unwind are supported and NO_LIBUNWIND means neither local nor remote
libunwind are supported.

Signed-off-by: He Kuang 
---
 tools/perf/arch/x86/util/Build|  2 +-
 tools/perf/config/Makefile| 23 +-
 tools/perf/util/Build |  2 +-
 tools/perf/util/thread.c  |  5 +--
 tools/perf/util/thread.h  | 17 ++--
 tools/perf/util/unwind-libunwind.c| 49 +
 tools/perf/util/unwind-libunwind_common.c | 71 +--
 tools/perf/util/unwind.h  | 37 +++-
 8 files changed, 173 insertions(+), 33 deletions(-)

diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index 4659703..bc24b75 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -7,7 +7,7 @@ libperf-y += perf_regs.o
 libperf-$(CONFIG_DWARF) += dwarf-regs.o
 libperf-$(CONFIG_BPF_PROLOGUE) += dwarf-regs.o
 
-libperf-$(CONFIG_LIBUNWIND)  += unwind-libunwind.o
+libperf-$(CONFIG_LOCAL_LIBUNWIND)+= unwind-libunwind.o
 libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
 
 libperf-$(CONFIG_AUXTRACE) += auxtrace.o
diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index c9e1625..8ac0440 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -354,15 +354,31 @@ ifeq ($(ARCH),powerpc)
 endif
 
 ifndef NO_LIBUNWIND
+  have_libunwind =
   ifeq ($(feature-libunwind-x86), 1)
 $(call detected,CONFIG_LIBUNWIND_X86)
 CFLAGS += -DHAVE_LIBUNWIND_X86_SUPPORT
+LDFLAGS += -lunwind-x86
+have_libunwind = 1
   endif
 
   ifneq ($(feature-libunwind), 1)
 msg := $(warning No libunwind found. Please install libunwind-dev[el] >= 
1.1 and/or set LIBUNWIND_DIR);
+NO_LOCAL_LIBUNWIND := 1
+  else
+have_libunwind = 1
+CFLAGS += -DHAVE_LIBUNWIND_LOCAL_SUPPORT
+$(call detected,CONFIG_LOCAL_LIBUNWIND)
+  endif
+
+  ifneq ($(have_libunwind), 1)
 NO_LIBUNWIND := 1
+  else
+CFLAGS += -I$(LIBUNWIND_DIR)/include
+LDFLAGS += -L$(LIBUNWIND_DIR)/lib
   endif
+else
+  NO_LOCAL_LIBUNWIND := 1
 endif
 
 ifndef NO_LIBBPF
@@ -400,7 +416,7 @@ else
   NO_DWARF_UNWIND := 1
 endif
 
-ifndef NO_LIBUNWIND
+ifndef NO_LOCAL_LIBUNWIND
   ifeq ($(ARCH),$(filter $(ARCH),arm arm64))
 $(call feature_check,libunwind-debug-frame)
 ifneq ($(feature-libunwind-debug-frame), 1)
@@ -411,12 +427,15 @@ ifndef NO_LIBUNWIND
 # non-ARM has no dwarf_find_debug_frame() function:
 CFLAGS += -DNO_LIBUNWIND_DEBUG_FRAME
   endif
-  CFLAGS  += -DHAVE_LIBUNWIND_SUPPORT
   EXTLIBS += $(LIBUNWIND_LIBS)
   CFLAGS  += $(LIBUNWIND_CFLAGS)
   LDFLAGS += $(LIBUNWIND_LDFLAGS)
 endif
 
+ifndef NO_LIBUNWIND
+  CFLAGS  += -DHAVE_LIBUNWIND_SUPPORT
+endif
+
 ifndef NO_LIBAUDIT
   ifneq ($(feature-libaudit), 1)
 msg := $(warning No libaudit.h found, disables 'trace' tool, please 
install audit-libs-devel or libaudit-dev);
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 25c31fb..ce69721 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -99,7 +99,7 @@ libperf-$(CONFIG_DWARF) += probe-finder.o
 libperf-$(CONFIG_DWARF) += dwarf-aux.o
 
 libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
-libperf-$(CONFIG_LIBUNWIND)  += unwind-libunwind.o
+libperf-$(CONFIG_LOCAL_LIBUNWIND)+= unwind-libunwind.o
 libperf-$(CONFIG_LIBUNWIND)  += unwind-libunwind_common.o
 
 libperf-$(CONFIG_LIBBABELTRACE) += data-convert-bt.o
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 3043113..4e1aaf5 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -43,9 +43,6 @@ struct thread *thread__new(pid_t pid, pid_t tid)
thread->cpu = -1;
INIT_LIST_HEAD(&thread->comm_list);
 
-   if (unwind__prepare_access(thread) < 0)
-   goto err_thread;
-
comm_str = malloc(32);
if (!comm_str)
goto err_thread;
@@ -59,6 +56,8 @@ struct thread *thread__new(pid_t pid, pid_t tid)
list_add(&comm->list, &thread->comm_list);
atomic_set(&thread->refcnt, 1);
RB_CLEAR_NODE(&thread->rb_node);
+
+   register_null_unwind_libunwind_ops(thread);
}
 
return thread;
diff --git a/tools/perf/util/thread.h b/tools/perf/u

[PATCH RESEND 1/8] mm: remove pointless struct in struct page definition

2016-05-24 Thread Vladimir Davydov

... to reduce indentation level thus leaving more space for comments.

Signed-off-by: Vladimir Davydov 
---
 include/linux/mm_types.h | 68 +++-
 1 file changed, 32 insertions(+), 36 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index d553855503e6..3cc5977a9cab 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -60,51 +60,47 @@ struct page {
};
 
/* Second double word */
-   struct {
-   union {
-   pgoff_t index;  /* Our offset within mapping. */
-   void *freelist; /* sl[aou]b first free object */
-   /* page_deferred_list().prev-- second tail page */
-   };
+   union {
+   pgoff_t index;  /* Our offset within mapping. */
+   void *freelist; /* sl[aou]b first free object */
+   /* page_deferred_list().prev-- second tail page */
+   };
 
-   union {
+   union {
 #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
-   /* Used for cmpxchg_double in slub */
-   unsigned long counters;
+   /* Used for cmpxchg_double in slub */
+   unsigned long counters;
 #else
-   /*
-* Keep _refcount separate from slub cmpxchg_double
-* data.  As the rest of the double word is protected by
-* slab_lock but _refcount is not.
-*/
-   unsigned counters;
+   /*
+* Keep _refcount separate from slub cmpxchg_double data.
+* As the rest of the double word is protected by slab_lock
+* but _refcount is not.
+*/
+   unsigned counters;
 #endif
+   struct {
 
-   struct {
-
-   union {
-   /*
-* Count of ptes mapped in mms, to show
-* when page is mapped & limit reverse
-* map searches.
-*/
-   atomic_t _mapcount;
-
-   struct { /* SLUB */
-   unsigned inuse:16;
-   unsigned objects:15;
-   unsigned frozen:1;
-   };
-   int units;  /* SLOB */
-   };
+   union {
/*
-* Usage count, *USE WRAPPER FUNCTION*
-* when manual accounting. See page_ref.h
+* Count of ptes mapped in mms, to show when
+* page is mapped & limit reverse map searches.
 */
-   atomic_t _refcount;
+   atomic_t _mapcount;
+
+   unsigned int active;/* SLAB */
+   struct {/* SLUB */
+   unsigned inuse:16;
+   unsigned objects:15;
+   unsigned frozen:1;
+   };
+   int units;  /* SLOB */
};
-   unsigned int active;/* SLAB */
+   /*
+* Usage count, *USE WRAPPER FUNCTION* when manual
+* accounting. See page_ref.h
+*/
+   atomic_t _refcount;
};
};
 
-- 
2.1.4

[PATCH v5 5/5] perf callchain: Support aarch64 cross-platform

2016-05-24 Thread He Kuang

Support aarch64 cross platform callchain unwind.

Signed-off-by: He Kuang 
---
 .../perf/arch/arm64/include/libunwind/libunwind-arch.h | 18 ++
 tools/perf/arch/arm64/util/unwind-libunwind.c  |  5 -
 tools/perf/config/Makefile | 12 
 tools/perf/util/Build  |  4 
 tools/perf/util/unwind-libunwind_common.c  | 10 ++
 tools/perf/util/unwind.h   |  3 +++
 6 files changed, 51 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/arch/arm64/include/libunwind/libunwind-arch.h

diff --git a/tools/perf/arch/arm64/include/libunwind/libunwind-arch.h 
b/tools/perf/arch/arm64/include/libunwind/libunwind-arch.h
new file mode 100644
index 000..47d13a6
--- /dev/null
+++ b/tools/perf/arch/arm64/include/libunwind/libunwind-arch.h
@@ -0,0 +1,18 @@
+#ifndef _LIBUNWIND_ARCH_H
+#define _LIBUNWIND_ARCH_H
+
+#include 
+#include <../perf_regs.h>
+#include <../../../../../../arch/arm64/include/uapi/asm/perf_regs.h>
+
+#define LIBUNWIND_AARCH64
+int libunwind__aarch64_reg_id(int regnum);
+
+#define LIBUNWIND__ARCH_REG_ID libunwind__aarch64_reg_id
+
+#include <../../../arm64/util/unwind-libunwind.c>
+
+#define UNWT_PREFIXUNW_PASTE(UNW_PASTE(_U, aarch64), _)
+#define UNWT_OBJ(fn)   UNW_PASTE(UNWT_PREFIX, fn)
+
+#endif /* _LIBUNWIND_ARCH_H */
diff --git a/tools/perf/arch/arm64/util/unwind-libunwind.c 
b/tools/perf/arch/arm64/util/unwind-libunwind.c
index a87afa9..5b557a5 100644
--- a/tools/perf/arch/arm64/util/unwind-libunwind.c
+++ b/tools/perf/arch/arm64/util/unwind-libunwind.c
@@ -1,11 +1,14 @@
 
 #include 
-#include 
 #include "perf_regs.h"
 #include "../../util/unwind.h"
 #include "../../util/debug.h"
 
+#ifndef LIBUNWIND_AARCH64
 int libunwind__arch_reg_id(int regnum)
+#else
+int libunwind__aarch64_reg_id(int regnum)
+#endif
 {
switch (regnum) {
case UNW_AARCH64_X0:
diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index 8ac0440..eb7cbce 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -362,6 +362,18 @@ ifndef NO_LIBUNWIND
 have_libunwind = 1
   endif
 
+  ifeq ($(feature-libunwind-aarch64), 1)
+$(call detected,CONFIG_LIBUNWIND_AARCH64)
+CFLAGS += -DHAVE_LIBUNWIND_AARCH64_SUPPORT
+LDFLAGS += -lunwind-aarch64
+have_libunwind = 1
+$(call feature_check,libunwind-debug-frame-aarch64)
+ifneq ($(feature-libunwind-debug-frame-aarch64), 1)
+  msg := $(warning No debug_frame support found in libunwind-aarch64);
+  CFLAGS += -DNO_LIBUNWIND_DEBUG_FRAME_AARCH64
+endif
+  endif
+
   ifneq ($(feature-libunwind), 1)
 msg := $(warning No libunwind found. Please install libunwind-dev[el] >= 
1.1 and/or set LIBUNWIND_DIR);
 NO_LOCAL_LIBUNWIND := 1
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 2373130..f1b51a2 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -104,10 +104,14 @@ libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
 libperf-$(CONFIG_LOCAL_LIBUNWIND)+= unwind-libunwind.o
 libperf-$(CONFIG_LIBUNWIND)  += unwind-libunwind_common.o
 libperf-$(CONFIG_LIBUNWIND_X86)  += unwind-libunwind_x86_32.o
+libperf-$(CONFIG_LIBUNWIND_AARCH64)  += unwind-libunwind_arm64.o
 
 $(OUTPUT)util/unwind-libunwind_x86_32.o: util/unwind-libunwind.c 
arch/x86/util/unwind-libunwind.c
$(QUIET_CC)$(CC) $(CFLAGS) -DREMOTE_UNWIND_LIBUNWIND 
-Iarch/x86/include/libunwind -c -o $@ util/unwind-libunwind.c
 
+$(OUTPUT)util/unwind-libunwind_arm64.o: util/unwind-libunwind.c 
arch/arm64/util/unwind-libunwind.c
+   $(QUIET_CC)$(CC) $(CFLAGS) -DREMOTE_UNWIND_LIBUNWIND 
-Iarch/arm64/include/libunwind -c -o $@ util/unwind-libunwind.c
+
 libperf-$(CONFIG_LIBBABELTRACE) += data-convert-bt.o
 
 libperf-y += scripting-engines/
diff --git a/tools/perf/util/unwind-libunwind_common.c 
b/tools/perf/util/unwind-libunwind_common.c
index 619c6c0..d19b062 100644
--- a/tools/perf/util/unwind-libunwind_common.c
+++ b/tools/perf/util/unwind-libunwind_common.c
@@ -89,6 +89,16 @@ void unwind__get_arch(struct thread *thread, struct map *map)
 #endif
use_local_unwind = 0;
}
+   } else if (!strcmp(arch, "arm64") || !strcmp(arch, "arm")) {
+   if (dso_type == DSO__TYPE_64BIT) {
+#ifdef HAVE_LIBUNWIND_AARCH64_SUPPORT
+   register_unwind_libunwind_ops(
+   &_Uaarch64_unwind_libunwind_ops, thread);
+#else
+   register_null_unwind_libunwind_ops(thread);
+#endif
+   use_local_unwind = 0;
+   }
}
 
if (use_local_unwind)
diff --git a/tools/perf/util/unwind.h b/tools/perf/util/unwind.h
index 7dafb6e..359f756 100644
--- a/tools/perf/util/unwind.h
+++ b/tools/perf/util/unwind.h
@@ -60,6 +60,9 @@ register_null_unwind_libunwind_ops(struct thread *thread 
__maybe_unused) {}
 #ifdef HAVE_LIBUNWIND_X86_SUPPORT
 e

Re: [PATCH] mm: memcontrol: fix possible css ref leak on oom

2016-05-24 Thread Michal Hocko

On Tue 24-05-16 12:01:42, Vladimir Davydov wrote:
> On Tue, May 24, 2016 at 10:47:37AM +0200, Michal Hocko wrote:
> > On Tue 24-05-16 11:43:19, Vladimir Davydov wrote:
> > > On Mon, May 23, 2016 at 07:44:43PM +0200, Michal Hocko wrote:
> > > > On Mon 23-05-16 19:02:10, Vladimir Davydov wrote:
> > > > > mem_cgroup_oom may be invoked multiple times while a process is 
> > > > > handling
> > > > > a page fault, in which case current->memcg_in_oom will be overwritten
> > > > > leaking the previously taken css reference.
> > > > 
> > > > Have you seen this happening? I was under impression that the page fault
> > > > paths that have oom enabled will not retry allocations.
> > > 
> > > filemap_fault will, for readahead.
> > 
> > I thought that the readahead is __GFP_NORETRY so we do not trigger OOM
> > killer.
> 
> Hmm, interesting. We do allocate readahead pages with __GFP_NORETRY, but
> we add them to page cache and hence charge with GFP_KERNEL or GFP_NOFS
> mask, see __do_page_cache_readahaed -> read_pages.

I guess we do not want to trigger OOM just because of readahead. What do
you think about the following? I will cook up a full patch if this
(untested) looks ok.
---
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 97354102794d..81363b834900 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -209,10 +209,10 @@ static inline struct page *page_cache_alloc_cold(struct 
address_space *x)
return __page_cache_alloc(mapping_gfp_mask(x)|__GFP_COLD);
 }
 
-static inline struct page *page_cache_alloc_readahead(struct address_space *x)
+static inline gfp_t readahead_gfp_mask(struct address_space *x)
 {
-   return __page_cache_alloc(mapping_gfp_mask(x) |
- __GFP_COLD | __GFP_NORETRY | __GFP_NOWARN);
+   return mapping_gfp_mask(x) |
+ __GFP_COLD | __GFP_NORETRY | __GFP_NOWARN;
 }
 
 typedef int filler_t(void *, struct page *);
diff --git a/mm/readahead.c b/mm/readahead.c
index 40be3ae0afe3..7431fefe4ede 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -108,7 +108,7 @@ int read_cache_pages(struct address_space *mapping, struct 
list_head *pages,
 EXPORT_SYMBOL(read_cache_pages);
 
 static int read_pages(struct address_space *mapping, struct file *filp,
-   struct list_head *pages, unsigned nr_pages)
+   struct list_head *pages, unsigned nr_pages, gfp_t gfp_mask)
 {
struct blk_plug plug;
unsigned page_idx;
@@ -126,8 +126,7 @@ static int read_pages(struct address_space *mapping, struct 
file *filp,
for (page_idx = 0; page_idx < nr_pages; page_idx++) {
struct page *page = lru_to_page(pages);
list_del(&page->lru);
-   if (!add_to_page_cache_lru(page, mapping, page->index,
-   mapping_gfp_constraint(mapping, GFP_KERNEL))) {
+   if (!add_to_page_cache_lru(page, mapping, page->index, 
gfp_mask)) {
mapping->a_ops->readpage(filp, page);
}
put_page(page);
@@ -159,6 +158,7 @@ int __do_page_cache_readahead(struct address_space 
*mapping, struct file *filp,
int page_idx;
int ret = 0;
loff_t isize = i_size_read(inode);
+   gfp_t gfp_mask = readahead_gfp_mask(mapping);
 
if (isize == 0)
goto out;
@@ -180,7 +180,7 @@ int __do_page_cache_readahead(struct address_space 
*mapping, struct file *filp,
if (page && !radix_tree_exceptional_entry(page))
continue;
 
-   page = page_cache_alloc_readahead(mapping);
+   page = __page_cache_alloc(gfp_mask);
if (!page)
break;
page->index = page_offset;
@@ -196,7 +196,7 @@ int __do_page_cache_readahead(struct address_space 
*mapping, struct file *filp,
 * will then handle the error.
 */
if (ret)
-   read_pages(mapping, filp, &page_pool, ret);
+   read_pages(mapping, filp, &page_pool, ret, gfp_mask);
BUG_ON(!list_empty(&page_pool));
 out:
return ret;

-- 
Michal Hocko
SUSE Labs

[PATCH v5 4/5] perf callchain: Support x86 target platform

2016-05-24 Thread He Kuang

Support x86(32-bit) cross platform callchain unwind.

Signed-off-by: He Kuang 
---
 .../perf/arch/x86/include/libunwind/libunwind-arch.h  | 18 ++
 tools/perf/arch/x86/util/unwind-libunwind.c   | 19 ++-
 tools/perf/util/Build |  6 ++
 tools/perf/util/unwind-libunwind_common.c |  6 --
 tools/perf/util/unwind.h  |  5 +
 5 files changed, 47 insertions(+), 7 deletions(-)
 create mode 100644 tools/perf/arch/x86/include/libunwind/libunwind-arch.h

diff --git a/tools/perf/arch/x86/include/libunwind/libunwind-arch.h 
b/tools/perf/arch/x86/include/libunwind/libunwind-arch.h
new file mode 100644
index 000..be8c675
--- /dev/null
+++ b/tools/perf/arch/x86/include/libunwind/libunwind-arch.h
@@ -0,0 +1,18 @@
+#ifndef _LIBUNWIND_ARCH_H
+#define _LIBUNWIND_ARCH_H
+
+#include 
+#include <../perf_regs.h>
+#include <../../../../../../arch/x86/include/uapi/asm/perf_regs.h>
+
+#define LIBUNWIND_X86_32
+int libunwind__x86_reg_id(int regnum);
+
+#define LIBUNWIND__ARCH_REG_ID libunwind__x86_reg_id
+
+#include <../../../x86/util/unwind-libunwind.c>
+
+#define UNWT_PREFIXUNW_PASTE(UNW_PASTE(_U, x86), _)
+#define UNWT_OBJ(fn)   UNW_PASTE(UNWT_PREFIX, fn)
+
+#endif /* _LIBUNWIND_ARCH_H */
diff --git a/tools/perf/arch/x86/util/unwind-libunwind.c 
b/tools/perf/arch/x86/util/unwind-libunwind.c
index db25e93..28831d8 100644
--- a/tools/perf/arch/x86/util/unwind-libunwind.c
+++ b/tools/perf/arch/x86/util/unwind-libunwind.c
@@ -1,12 +1,18 @@
 
 #include 
+#if defined(LIBUNWIND_X86_32)
+#include 
+#elif defined(LIBUNWIND_X86_64)
+#include 
+#elif defined(HAVE_LIBUNWIND_LOCAL_SUPPORT)
 #include 
+#endif
 #include "perf_regs.h"
 #include "../../util/unwind.h"
 #include "../../util/debug.h"
 
-#ifdef HAVE_ARCH_X86_64_SUPPORT
-int libunwind__arch_reg_id(int regnum)
+#if !defined(REMOTE_UNWIND_LIBUNWIND) && defined(HAVE_ARCH_X86_64_SUPPORT)
+int LIBUNWIND__ARCH_REG_ID(int regnum)
 {
int id;
 
@@ -69,8 +75,11 @@ int libunwind__arch_reg_id(int regnum)
 
return id;
 }
-#else
-int libunwind__arch_reg_id(int regnum)
+#endif
+
+#if !defined(REMOTE_UNWIND_LIBUNWIND) && !defined(HAVE_ARCH_X86_64_SUPPORT) || 
\
+   defined(LIBUNWIND_X86_32)
+int LIBUNWIND__ARCH_REG_ID(int regnum)
 {
int id;
 
@@ -109,4 +118,4 @@ int libunwind__arch_reg_id(int regnum)
 
return id;
 }
-#endif /* HAVE_ARCH_X86_64_SUPPORT */
+#endif
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index ce69721..2373130 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -1,3 +1,5 @@
+include ../scripts/Makefile.include
+
 libperf-y += alias.o
 libperf-y += annotate.o
 libperf-y += build-id.o
@@ -101,6 +103,10 @@ libperf-$(CONFIG_DWARF) += dwarf-aux.o
 libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
 libperf-$(CONFIG_LOCAL_LIBUNWIND)+= unwind-libunwind.o
 libperf-$(CONFIG_LIBUNWIND)  += unwind-libunwind_common.o
+libperf-$(CONFIG_LIBUNWIND_X86)  += unwind-libunwind_x86_32.o
+
+$(OUTPUT)util/unwind-libunwind_x86_32.o: util/unwind-libunwind.c 
arch/x86/util/unwind-libunwind.c
+   $(QUIET_CC)$(CC) $(CFLAGS) -DREMOTE_UNWIND_LIBUNWIND 
-Iarch/x86/include/libunwind -c -o $@ util/unwind-libunwind.c
 
 libperf-$(CONFIG_LIBBABELTRACE) += data-convert-bt.o
 
diff --git a/tools/perf/util/unwind-libunwind_common.c 
b/tools/perf/util/unwind-libunwind_common.c
index f44833b..619c6c0 100644
--- a/tools/perf/util/unwind-libunwind_common.c
+++ b/tools/perf/util/unwind-libunwind_common.c
@@ -82,9 +82,11 @@ void unwind__get_arch(struct thread *thread, struct map *map)
if (!strcmp(arch, "x86")) {
if (dso_type != DSO__TYPE_64BIT) {
 #ifdef HAVE_LIBUNWIND_X86_SUPPORT
-   pr_err("unwind: target platform=%s is not 
implemented\n", arch);
-#endif
+   register_unwind_libunwind_ops(
+   &_Ux86_unwind_libunwind_ops, thread);
+#else
register_null_unwind_libunwind_ops(thread);
+#endif
use_local_unwind = 0;
}
}
diff --git a/tools/perf/util/unwind.h b/tools/perf/util/unwind.h
index e170be7..7dafb6e 100644
--- a/tools/perf/util/unwind.h
+++ b/tools/perf/util/unwind.h
@@ -56,6 +56,11 @@ static inline void unwind__get_arch(struct thread *thread 
__maybe_unused,
 static inline void
 register_null_unwind_libunwind_ops(struct thread *thread __maybe_unused) {}
 #endif
+
+#ifdef HAVE_LIBUNWIND_X86_SUPPORT
+extern struct unwind_libunwind_ops _Ux86_unwind_libunwind_ops;
+#endif
+
 #else
 static inline int
 unwind__get_entries(unwind_entry_cb_t cb __maybe_unused,
-- 
1.8.5.2

[PATCH v5 0/5] Add support for remote unwind

2016-05-24 Thread He Kuang

v4 url:
  http://thread.gmane.org/gmane.linux.kernel/2224430

Currently, perf script uses host unwind methods(local unwind) to parse
perf.data callchain info regardless of the target architecture. So we
get wrong result and no promotion when do remote unwind on other
platforms/machines.

This patchset checks whether a dso is 32-bit or 64-bit according to
elf class info for each thread to let perf use the correct remote
unwind methods instead.

Only x86 and aarch64 is added in this patchset to show the work flow,
other platforms can be added easily.

We can see the right result for unwind info on different machines, for
example: perf.data recorded on i686 qemu with '-g' option and parsed
on x86_64 machine.

before this patchset:

  hello  1219 [001] 72190.667975: probe:sys_close: (c1169d60)
  c1169d61 sys_close ([kernel.kallsyms])
  c189c0d7 sysenter_past_esp ([kernel.kallsyms])
  b777aba9 [unknown] ([vdso32])

after:
(Add vdso into buildid-cache first by 'perf buildid-cache -a' and
libraries are provided in symfs dir)

  hello  1219 [001] 72190.667975: probe:sys_close: (c1169d60)
  c1169d61 sys_close ([kernel.kallsyms])
  c189c0d7 sysenter_past_esp ([kernel.kallsyms])
  b777aba9 __kernel_vsyscall ([vdso32])
  b76971cc close (/lib/libc-2.22.so)
   804842e fib (/tmp/hello)
   804849d main (/tmp/hello)
  b75d746e __libc_start_main (/lib/libc-2.22.so)
   8048341 _start (/tmp/hello)

For using remote libunwind libraries, reference this:
  http://thread.gmane.org/gmane.linux.kernel/2224430

and now we can use LIBUNWIND_DIR to specific custom dirctories
containing libunwind libs.

v5:

 - Support LIBUNWIND_DIR args for detect remote libunwind libraries.
 - Change patch 2/5 commit messages for better understanding.
 - Self test for local (un)supported, remote un(supported) cases and
   fix some bugs in v4.

v4:

 - Move reference of buildid dir to 'symfs/.debug' if --symfs is
   given.
 - Split makefile changes out from patch 'Add support for
   cross-platform unwind'.
 - Use existing code normalize_arch() for testing the arch of
   perf.data.

v3:

 - Remove --vdso option, store vdso buildid in perf.data and let perf
   fetch it automatically.
 - Use existing dso__type() function to test if dso is 32-bit or
   64-bit.

v2:

 - Explain the reason why we can omit dwarf judgement when recording
   in commit message.
 - Elaborate on why we need to add a custom vdso path option, and
   change the type name to DSO_BINARY_TYPE__VDSO.
 - Hide the build tests status for cross platform unwind.
 - Keep generic version of libunwind-debug-frame test.
 - Put 32/64-bit test functions into separate patch.
 - Extract unwind related functions to unwind-libunwind.c and add new
   file for common parts used by both local and remote unwind.
 - Eliminate most of the ifdefs in .c file.

Thanks.

He Kuang (5):
  perf tools: Use LIBUNWIND_DIR for remote libunwind feature check
  perf tools: Show warnings for unsupported cross-platform unwind
  perf callchain: Add support for cross-platform unwind
  perf callchain: Support x86 target platform
  perf callchain: Support aarch64 cross-platform

 .../arch/arm64/include/libunwind/libunwind-arch.h  |  18 
 tools/perf/arch/arm64/util/unwind-libunwind.c  |   5 +-
 tools/perf/arch/common.c   |   2 +-
 tools/perf/arch/common.h   |   1 +
 .../arch/x86/include/libunwind/libunwind-arch.h|  18 
 tools/perf/arch/x86/util/Build |   2 +-
 tools/perf/arch/x86/util/unwind-libunwind.c|  19 +++-
 tools/perf/config/Makefile |  49 -
 tools/perf/util/Build  |  13 ++-
 tools/perf/util/thread.c   |   7 +-
 tools/perf/util/thread.h   |  17 +++-
 tools/perf/util/unwind-libunwind.c |  49 +++--
 tools/perf/util/unwind-libunwind_common.c  | 109 +
 tools/perf/util/unwind.h   |  50 --
 14 files changed, 323 insertions(+), 36 deletions(-)
 create mode 100644 tools/perf/arch/arm64/include/libunwind/libunwind-arch.h
 create mode 100644 tools/perf/arch/x86/include/libunwind/libunwind-arch.h
 create mode 100644 tools/perf/util/unwind-libunwind_common.c

-- 
1.8.5.2

[PATCH v5 1/5] perf tools: Use LIBUNWIND_DIR for remote libunwind feature check

2016-05-24 Thread He Kuang

Pass LIBUNWIND_DIR to feature check flags for remote libunwind
tests. So perf can be able to detect remote libunwind libraries from
arbitrary directory.

Signed-off-by: He Kuang 
---
 tools/perf/config/Makefile | 9 +
 1 file changed, 9 insertions(+)

diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index 1e46277..6f9f566 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -67,9 +67,18 @@ endif
 #
 #   make DEBUG=1 LIBUNWIND_DIR=/opt/libunwind/
 #
+
+libunwind_arch_set_flags = $(eval $(libunwind_arch_set_flags_code))
+define libunwind_arch_set_flags_code
+  FEATURE_CHECK_CFLAGS-libunwind-$(1)  = -I$(LIBUNWIND_DIR)/include
+  FEATURE_CHECK_LDFLAGS-libunwind-$(1) = -L$(LIBUNWIND_DIR)/lib
+endef
+
 ifdef LIBUNWIND_DIR
   LIBUNWIND_CFLAGS  = -I$(LIBUNWIND_DIR)/include
   LIBUNWIND_LDFLAGS = -L$(LIBUNWIND_DIR)/lib
+  LIBUNWIND_ARCHS = x86 x86_64 arm aarch64 debug-frame-arm debug-frame-aarch64
+  $(foreach libunwind_arch,$(LIBUNWIND_ARCHS),$(call 
libunwind_arch_set_flags,$(libunwind_arch)))
 endif
 LIBUNWIND_LDFLAGS += $(LIBUNWIND_LIBS)
 
-- 
1.8.5.2

[PATCH v5 2/5] perf tools: Show warnings for unsupported cross-platform unwind

2016-05-24 Thread He Kuang

Currently, perf script uses host unwind methods to parse perf.data
callchain info regardless of the target architecture. So we get wrong
result without any warnings when unwinding callchains of x86(32-bit)
on x86(64-bit) machine.

This patch shows warning messages when we do remote unwind x86(32-bit)
on other machines. Same thing for other platforms will be added in
next patches.

Common functions which will be used by both local unwind and remote
unwind are separated into new file 'unwind-libunwind_common.c'.

Signed-off-by: He Kuang 
---
 tools/perf/arch/common.c  |  2 +-
 tools/perf/arch/common.h  |  1 +
 tools/perf/config/Makefile|  5 +
 tools/perf/util/Build |  1 +
 tools/perf/util/thread.c  |  2 ++
 tools/perf/util/unwind-libunwind_common.c | 34 +++
 tools/perf/util/unwind.h  |  5 +
 7 files changed, 49 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/util/unwind-libunwind_common.c

diff --git a/tools/perf/arch/common.c b/tools/perf/arch/common.c
index e83c8ce..fa090a9 100644
--- a/tools/perf/arch/common.c
+++ b/tools/perf/arch/common.c
@@ -102,7 +102,7 @@ static int lookup_triplets(const char *const *triplets, 
const char *name)
  * Return architecture name in a normalized form.
  * The conversion logic comes from the Makefile.
  */
-static const char *normalize_arch(char *arch)
+const char *normalize_arch(char *arch)
 {
if (!strcmp(arch, "x86_64"))
return "x86";
diff --git a/tools/perf/arch/common.h b/tools/perf/arch/common.h
index 7529cfb..6b01c73 100644
--- a/tools/perf/arch/common.h
+++ b/tools/perf/arch/common.h
@@ -6,5 +6,6 @@
 extern const char *objdump_path;
 
 int perf_env__lookup_objdump(struct perf_env *env);
+const char *normalize_arch(char *arch);
 
 #endif /* ARCH_PERF_COMMON_H */
diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index 6f9f566..c9e1625 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -354,6 +354,11 @@ ifeq ($(ARCH),powerpc)
 endif
 
 ifndef NO_LIBUNWIND
+  ifeq ($(feature-libunwind-x86), 1)
+$(call detected,CONFIG_LIBUNWIND_X86)
+CFLAGS += -DHAVE_LIBUNWIND_X86_SUPPORT
+  endif
+
   ifneq ($(feature-libunwind), 1)
 msg := $(warning No libunwind found. Please install libunwind-dev[el] >= 
1.1 and/or set LIBUNWIND_DIR);
 NO_LIBUNWIND := 1
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 8c6c8a0..25c31fb 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -100,6 +100,7 @@ libperf-$(CONFIG_DWARF) += dwarf-aux.o
 
 libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
 libperf-$(CONFIG_LIBUNWIND)  += unwind-libunwind.o
+libperf-$(CONFIG_LIBUNWIND)  += unwind-libunwind_common.o
 
 libperf-$(CONFIG_LIBBABELTRACE) += data-convert-bt.o
 
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 45fcb71..3043113 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -205,6 +205,8 @@ void thread__insert_map(struct thread *thread, struct map 
*map)
 {
map_groups__fixup_overlappings(thread->mg, map, stderr);
map_groups__insert(thread->mg, map);
+
+   unwind__get_arch(thread, map);
 }
 
 static int thread__clone_map_groups(struct thread *thread,
diff --git a/tools/perf/util/unwind-libunwind_common.c 
b/tools/perf/util/unwind-libunwind_common.c
new file mode 100644
index 000..3946c99
--- /dev/null
+++ b/tools/perf/util/unwind-libunwind_common.c
@@ -0,0 +1,34 @@
+#include "thread.h"
+#include "session.h"
+#include "unwind.h"
+#include "symbol.h"
+#include "debug.h"
+#include "arch/common.h"
+
+void unwind__get_arch(struct thread *thread, struct map *map)
+{
+   const char *arch;
+   enum dso_type dso_type;
+
+   if (!thread->mg->machine->env)
+   return;
+
+   dso_type = dso__type(map->dso, thread->mg->machine);
+   if (dso_type == DSO__TYPE_UNKNOWN)
+   return;
+
+   if (thread->addr_space)
+   pr_debug("unwind: thread map already set, 64bit is %d, 
dso=%s\n",
+dso_type == DSO__TYPE_64BIT, map->dso->name);
+
+   arch = normalize_arch(thread->mg->machine->env->arch);
+
+   if (!strcmp(arch, "x86")) {
+   if (dso_type != DSO__TYPE_64BIT)
+#ifdef HAVE_LIBUNWIND_X86_SUPPORT
+   pr_err("unwind: target platform=%s is not 
implemented\n", arch);
+#else
+   pr_err("unwind: target platform=%s is not supported\n", 
arch);
+#endif
+   }
+}
diff --git a/tools/perf/util/unwind.h b/tools/perf/util/unwind.h
index 12790cf..889d630 100644
--- a/tools/perf/util/unwind.h
+++ b/tools/perf/util/unwind.h
@@ -24,6 +24,7 @@ int libunwind__arch_reg_id(int regnum);
 int unwind__prepare_access(struct thread *thread);
 void unwind__flush_access(struct thread *thread);
 void unwind__finish_access(struct thread *thread);
+void unwind__get_ar

1 2 3 4 5 6 7 8 9 >

1 - 100 of 812 matches

Mail list logo