Re: [PATCH 0/4] net-next: dsa: fix flow dissection

2017-08-09 Thread David Miller
From: John Crispin 
Date: Wed,  9 Aug 2017 14:41:15 +0200

> RPS and probably other kernel features are currently broken on some if not
> all DSA devices. The root cause of this is that skb_hash will call the
> flow_dissector. At this point the skb still contains the magic switch
> header and the skb->protocol field is not set up to the correct 802.3
> value yet. By the time the tag specific code is called, removing the header
> and properly setting the protocol an invalid hash is already set. In the
> case of the mt7530 this will result in all flows always having the same
> hash.
> 
> Changes since RFC:
> * use a callback instead of static values
> * add cover letter

Series applied, thanks.


Re: [PATCH 0/4] net-next: dsa: fix flow dissection

2017-08-09 Thread David Miller
From: John Crispin 
Date: Wed,  9 Aug 2017 14:41:15 +0200

> RPS and probably other kernel features are currently broken on some if not
> all DSA devices. The root cause of this is that skb_hash will call the
> flow_dissector. At this point the skb still contains the magic switch
> header and the skb->protocol field is not set up to the correct 802.3
> value yet. By the time the tag specific code is called, removing the header
> and properly setting the protocol an invalid hash is already set. In the
> case of the mt7530 this will result in all flows always having the same
> hash.
> 
> Changes since RFC:
> * use a callback instead of static values
> * add cover letter

Series applied, thanks.


Re: [PATCH 0/2] net-next: mediatek: bring up QDMA RX ring 0

2017-08-09 Thread David Miller
From: John Crispin 
Date: Wed,  9 Aug 2017 12:09:30 +0200

> The MT7623 has several DMA rings. Inside the SW path, the core will use
> the PDMA when receiving traffic. While bringing up the HW path we noticed
> that the PPE requires the QDMA RX to also be brought up as it uses this
> ring internally for its flow scheduling.

Series applied.


Re: [PATCH 0/2] net-next: mediatek: bring up QDMA RX ring 0

2017-08-09 Thread David Miller
From: John Crispin 
Date: Wed,  9 Aug 2017 12:09:30 +0200

> The MT7623 has several DMA rings. Inside the SW path, the core will use
> the PDMA when receiving traffic. While bringing up the HW path we noticed
> that the PPE requires the QDMA RX to also be brought up as it uses this
> ring internally for its flow scheduling.

Series applied.


Re: [PATCH] net: atm: make atmdev_ops const

2017-08-09 Thread David Miller
From: Bhumika Goyal 
Date: Wed,  9 Aug 2017 15:02:08 +0530

> Make these const as they are only stored in the ops field of a atm_dev
> structure, which is const.
> Done using Coccinelle.
> 
> Signed-off-by: Bhumika Goyal 

Applied.


Re: [PATCH] net: atm: make atmdev_ops const

2017-08-09 Thread David Miller
From: Bhumika Goyal 
Date: Wed,  9 Aug 2017 15:02:08 +0530

> Make these const as they are only stored in the ops field of a atm_dev
> structure, which is const.
> Done using Coccinelle.
> 
> Signed-off-by: Bhumika Goyal 

Applied.


Re: [PATCH] atm: make atmdev_ops const

2017-08-09 Thread David Miller
From: Bhumika Goyal 
Date: Wed,  9 Aug 2017 14:49:15 +0530

> Make these structures const as they are either passed to the function
> atm_dev_register having the corresponding argument as const or stored in
> the ops field of a atm_dev structure, which is also const.
> Done using Coccinelle.
> 
> Signed-off-by: Bhumika Goyal 

Applied.


Re: [PATCH] atm: make atmdev_ops const

2017-08-09 Thread David Miller
From: Bhumika Goyal 
Date: Wed,  9 Aug 2017 14:49:15 +0530

> Make these structures const as they are either passed to the function
> atm_dev_register having the corresponding argument as const or stored in
> the ops field of a atm_dev structure, which is also const.
> Done using Coccinelle.
> 
> Signed-off-by: Bhumika Goyal 

Applied.


Re: [PATCH] net: dsa: make dsa_switch_ops const

2017-08-09 Thread David Miller
From: Bhumika Goyal 
Date: Wed,  9 Aug 2017 10:34:15 +0530

> Make these structures const as they are only stored in the ops field of
> a dsa_switch structure, which is const.
> Done using Coccinelle.
> 
> Signed-off-by: Bhumika Goyal 

Applied, thank you.


Re: [PATCH] net: dsa: make dsa_switch_ops const

2017-08-09 Thread David Miller
From: Bhumika Goyal 
Date: Wed,  9 Aug 2017 10:34:15 +0530

> Make these structures const as they are only stored in the ops field of
> a dsa_switch structure, which is const.
> Done using Coccinelle.
> 
> Signed-off-by: Bhumika Goyal 

Applied, thank you.


Re: unregister_netdevice: waiting for eth0 to become free. Usage count = 1

2017-08-09 Thread Wei Wang
Hi John,

Is it possible to try the attached patch?
I am not sure if it actually fixes the issue. But I think it is worth a try.
Also, could you get me all the ipv6 routes when you plug in the usb
using "ip -6 route show"? (If you have multiple routing tables
configured, could you dump them all?)

Thanks a lot.
Wei

On Wed, Aug 9, 2017 at 6:36 PM, Wei Wang  wrote:
> On Wed, Aug 9, 2017 at 6:26 PM, John Stultz  wrote:
>> On Wed, Aug 9, 2017 at 5:36 PM, Wei Wang  wrote:
>>> On Wed, Aug 9, 2017 at 4:44 PM, John Stultz  wrote:
 On Wed, Aug 9, 2017 at 4:34 PM, Cong Wang  wrote:
> (Cc'ing Wei whose commit was blamed)
>
> On Mon, Aug 7, 2017 at 2:15 PM, John Stultz  
> wrote:
>> On Mon, Aug 7, 2017 at 2:05 PM, John Stultz  
>> wrote:
>>> So, with recent testing with my HiKey board, I've been noticing some
>>> quirky behavior with my USB eth adapter.
>>>
>>> Basically, pluging the usb eth adapter in and then removing it, when
>>> plugging it back in I often find that its not detected, and the system
>>> slowly spits out the following message over and over:
>>>   unregister_netdevice: waiting for eth0 to become free. Usage count = 1
>>
>> The other bit is that after this starts printing, the board will no
>> longer reboot (it hangs continuing to occasionally print the above
>> message), and I have to manually reset the device.
>>
>
> So this warning is not temporarily shown but lasts until a reboot,
> right? If so it is a dst refcnt leak.

 Correct, once I get into the state it lasts until a reboot.

> How reproducible is it for you? From my reading, it seems always
> reproduced when you unplug and plug your usb eth interface?
> Is there anything else involved? For example, network namespace.

 So with 4.13-rc3/4 I seem to trigger it easily, often with the first
 unplug of the USB eth adapter.

 But as I get back closer to 4.12, it seemingly becomes harder to
 trigger, but sometimes still happens.

 So far, I've not been able to trigger it with 4.12.

 I don't think network namespaces are involved?  Though its out of my
 area, so AOSP may be using them these days.  Is there a simple way to
 check?

 I'll also do another bisection to see if the bad point moves back any 
 further.
>>
>> So I went through another bisection around and got  9514528d92d4 ipv6:
>> call dst_dev_put() properly as the first bad commit again.
>>
>>> If you see the problem starts to happen on commit
>>> 9514528d92d4cbe086499322370155ed69f5d06c, could you try reverting all
>>> the following commits:
>>> (from new to old)
>>> 1eb04e7c9e63 net: reorder all the dst flags
>>> a4c2fd7f7891 net: remove DST_NOCACHE flag
>>> b2a9c0ed75a3 net: remove DST_NOGC flag
>>> 5b7c9a8ff828 net: remove dst gc related code
>>> db916649b5dd ipv6: get rid of icmp6 dst garbage collector
>>> 587fea741134 ipv6: mark DST_NOGC and remove the operation of dst_free()
>>> ad65a2f05695 ipv6: call dst_hold_safe() properly
>>> 9514528d92d4 ipv6: call dst_dev_put() properly
>>
>>
>> And reverting this set off of 4.13-rc4 seems to make the issue go away.
>>
>> Is there anything I can test to help narrow down the specific problem
>> with that patchset?
>>
>
> Thanks John for confirming.
> Let me spend some time on the commits and I will let you know if I
> have some debug image for you to try.
>
> Wei
>
>
>> thanks
>> -john
From 93f2836679c81915b110ff56617f9f5dae2e6927 Mon Sep 17 00:00:00 2001
From: Wei Wang 
Date: Wed, 9 Aug 2017 22:27:36 -0700
Subject: [PATCH] ipv6: unregister netdev bug fix

Change-Id: I30fa739989ac50fbc7f4cbc6a04130005589cc25
---
 include/net/ip6_route.h |  1 +
 net/ipv6/addrconf.c | 10 +++---
 net/ipv6/anycast.c  |  3 ++-
 net/ipv6/route.c|  2 +-
 4 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 907d39a42f6b..dec1424ce619 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -94,6 +94,7 @@ int ipv6_route_ioctl(struct net *net, unsigned int cmd, void __user *arg);
 int ip6_route_add(struct fib6_config *cfg, struct netlink_ext_ack *extack);
 int ip6_ins_rt(struct rt6_info *);
 int ip6_del_rt(struct rt6_info *);
+void rt6_uncached_list_add(struct rt6_info *rt);
 
 static inline int ip6_route_get_saddr(struct net *net, struct rt6_info *rt,
   const struct in6_addr *daddr,
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 3c46e9513a31..06a27addb93c 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3079,7 +3079,8 @@ static void init_loopback(struct net_device *dev)
 			/* Failure cases are ignored */
 			if (!IS_ERR(sp_rt)) {
 sp_ifa->rt = sp_rt;
-ip6_ins_rt(sp_rt);
+if 

Re: unregister_netdevice: waiting for eth0 to become free. Usage count = 1

2017-08-09 Thread Wei Wang
Hi John,

Is it possible to try the attached patch?
I am not sure if it actually fixes the issue. But I think it is worth a try.
Also, could you get me all the ipv6 routes when you plug in the usb
using "ip -6 route show"? (If you have multiple routing tables
configured, could you dump them all?)

Thanks a lot.
Wei

On Wed, Aug 9, 2017 at 6:36 PM, Wei Wang  wrote:
> On Wed, Aug 9, 2017 at 6:26 PM, John Stultz  wrote:
>> On Wed, Aug 9, 2017 at 5:36 PM, Wei Wang  wrote:
>>> On Wed, Aug 9, 2017 at 4:44 PM, John Stultz  wrote:
 On Wed, Aug 9, 2017 at 4:34 PM, Cong Wang  wrote:
> (Cc'ing Wei whose commit was blamed)
>
> On Mon, Aug 7, 2017 at 2:15 PM, John Stultz  
> wrote:
>> On Mon, Aug 7, 2017 at 2:05 PM, John Stultz  
>> wrote:
>>> So, with recent testing with my HiKey board, I've been noticing some
>>> quirky behavior with my USB eth adapter.
>>>
>>> Basically, pluging the usb eth adapter in and then removing it, when
>>> plugging it back in I often find that its not detected, and the system
>>> slowly spits out the following message over and over:
>>>   unregister_netdevice: waiting for eth0 to become free. Usage count = 1
>>
>> The other bit is that after this starts printing, the board will no
>> longer reboot (it hangs continuing to occasionally print the above
>> message), and I have to manually reset the device.
>>
>
> So this warning is not temporarily shown but lasts until a reboot,
> right? If so it is a dst refcnt leak.

 Correct, once I get into the state it lasts until a reboot.

> How reproducible is it for you? From my reading, it seems always
> reproduced when you unplug and plug your usb eth interface?
> Is there anything else involved? For example, network namespace.

 So with 4.13-rc3/4 I seem to trigger it easily, often with the first
 unplug of the USB eth adapter.

 But as I get back closer to 4.12, it seemingly becomes harder to
 trigger, but sometimes still happens.

 So far, I've not been able to trigger it with 4.12.

 I don't think network namespaces are involved?  Though its out of my
 area, so AOSP may be using them these days.  Is there a simple way to
 check?

 I'll also do another bisection to see if the bad point moves back any 
 further.
>>
>> So I went through another bisection around and got  9514528d92d4 ipv6:
>> call dst_dev_put() properly as the first bad commit again.
>>
>>> If you see the problem starts to happen on commit
>>> 9514528d92d4cbe086499322370155ed69f5d06c, could you try reverting all
>>> the following commits:
>>> (from new to old)
>>> 1eb04e7c9e63 net: reorder all the dst flags
>>> a4c2fd7f7891 net: remove DST_NOCACHE flag
>>> b2a9c0ed75a3 net: remove DST_NOGC flag
>>> 5b7c9a8ff828 net: remove dst gc related code
>>> db916649b5dd ipv6: get rid of icmp6 dst garbage collector
>>> 587fea741134 ipv6: mark DST_NOGC and remove the operation of dst_free()
>>> ad65a2f05695 ipv6: call dst_hold_safe() properly
>>> 9514528d92d4 ipv6: call dst_dev_put() properly
>>
>>
>> And reverting this set off of 4.13-rc4 seems to make the issue go away.
>>
>> Is there anything I can test to help narrow down the specific problem
>> with that patchset?
>>
>
> Thanks John for confirming.
> Let me spend some time on the commits and I will let you know if I
> have some debug image for you to try.
>
> Wei
>
>
>> thanks
>> -john
From 93f2836679c81915b110ff56617f9f5dae2e6927 Mon Sep 17 00:00:00 2001
From: Wei Wang 
Date: Wed, 9 Aug 2017 22:27:36 -0700
Subject: [PATCH] ipv6: unregister netdev bug fix

Change-Id: I30fa739989ac50fbc7f4cbc6a04130005589cc25
---
 include/net/ip6_route.h |  1 +
 net/ipv6/addrconf.c | 10 +++---
 net/ipv6/anycast.c  |  3 ++-
 net/ipv6/route.c|  2 +-
 4 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 907d39a42f6b..dec1424ce619 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -94,6 +94,7 @@ int ipv6_route_ioctl(struct net *net, unsigned int cmd, void __user *arg);
 int ip6_route_add(struct fib6_config *cfg, struct netlink_ext_ack *extack);
 int ip6_ins_rt(struct rt6_info *);
 int ip6_del_rt(struct rt6_info *);
+void rt6_uncached_list_add(struct rt6_info *rt);
 
 static inline int ip6_route_get_saddr(struct net *net, struct rt6_info *rt,
   const struct in6_addr *daddr,
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 3c46e9513a31..06a27addb93c 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3079,7 +3079,8 @@ static void init_loopback(struct net_device *dev)
 			/* Failure cases are ignored */
 			if (!IS_ERR(sp_rt)) {
 sp_ifa->rt = sp_rt;
-ip6_ins_rt(sp_rt);
+if (ip6_ins_rt(sp_rt))
+	rt6_uncached_list_add(sp_rt);
 			}
 		}
 		read_unlock_bh(>lock);
@@ -3711,6 +3712,7 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 			

Re: [PATCH] PCI: dwc: dra7xx: fix error return code in dra7xx_pcie_probe()

2017-08-09 Thread Kishon Vijay Abraham I


On Wednesday 09 August 2017 09:46 PM, Gustavo A. R. Silva wrote:
> platform_get_irq() returns an error code, but the pci-dra7xx driver
> ignores it and always returns -EINVAL. This is not correct and,
> prevents -EPROBE_DEFER from being propagated properly.
> 
> Print and propagate the return value of platform_get_irq on failure.
> 
> This issue was detected with the help of Coccinelle.
> 
> Signed-off-by: Gustavo A. R. Silva 

Acked-by: Kishon Vijay Abraham I 
> ---
>  drivers/pci/dwc/pci-dra7xx.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
> index f2fc5f4..f58e1b4 100644
> --- a/drivers/pci/dwc/pci-dra7xx.c
> +++ b/drivers/pci/dwc/pci-dra7xx.c
> @@ -616,8 +616,8 @@ static int __init dra7xx_pcie_probe(struct 
> platform_device *pdev)
>  
>   irq = platform_get_irq(pdev, 0);
>   if (irq < 0) {
> - dev_err(dev, "missing IRQ resource\n");
> - return -EINVAL;
> + dev_err(dev, "missing IRQ resource: %d\n", irq);
> + return irq;
>   }
>  
>   res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "ti_conf");
> 


Re: [PATCH] PCI: dwc: dra7xx: fix error return code in dra7xx_pcie_probe()

2017-08-09 Thread Kishon Vijay Abraham I


On Wednesday 09 August 2017 09:46 PM, Gustavo A. R. Silva wrote:
> platform_get_irq() returns an error code, but the pci-dra7xx driver
> ignores it and always returns -EINVAL. This is not correct and,
> prevents -EPROBE_DEFER from being propagated properly.
> 
> Print and propagate the return value of platform_get_irq on failure.
> 
> This issue was detected with the help of Coccinelle.
> 
> Signed-off-by: Gustavo A. R. Silva 

Acked-by: Kishon Vijay Abraham I 
> ---
>  drivers/pci/dwc/pci-dra7xx.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
> index f2fc5f4..f58e1b4 100644
> --- a/drivers/pci/dwc/pci-dra7xx.c
> +++ b/drivers/pci/dwc/pci-dra7xx.c
> @@ -616,8 +616,8 @@ static int __init dra7xx_pcie_probe(struct 
> platform_device *pdev)
>  
>   irq = platform_get_irq(pdev, 0);
>   if (irq < 0) {
> - dev_err(dev, "missing IRQ resource\n");
> - return -EINVAL;
> + dev_err(dev, "missing IRQ resource: %d\n", irq);
> + return irq;
>   }
>  
>   res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "ti_conf");
> 


Re: kvm: WARNING in kvm_arch_vcpu_ioctl_run

2017-08-09 Thread Wanpeng Li
2017-08-10 1:07 GMT+08:00 Dmitry Vyukov :
> Hello,
>
> syzkaller fuzzer has hit the following WARNING in kvm_arch_vcpu_ioctl_run.
> This is easily reproducible and reproducer is attached at the bottom.
> The report is on upstream commit
> 26c5cebfdb6ca799186f1e56be7d6f2480c5012c. This requires setting
> kvm-intel.unrestricted_guest=0 on the machine, with
> unrestricted_guest=1 the WARNING does not happen. Output of the
> program is:
>
> ret1=0 exit_reason=17 suberror=1
> ret2=0 exit_reason=8 suberror=65530

Please have a try. https://lkml.org/lkml/2017/8/10/27

Regards,
Wanpeng Li

>
>
> WARNING: CPU: 1 PID: 2850 at arch/x86/kvm/x86.c:7223
> kvm_arch_vcpu_ioctl_run+0x213/0x5870 arch/x86/kvm/x86.c:7223
> CPU: 1 PID: 2850 Comm: a.out Not tainted 4.13.0-rc3+ #445
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:16 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:52
>  panic+0x1e4/0x417 kernel/panic.c:180
>  __warn+0x1c4/0x1d9 kernel/panic.c:541
>  report_bug+0x211/0x2d0 lib/bug.c:183
>  fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
>  do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
>  do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
>  do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
>  invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846
> RIP: 0010:kvm_arch_vcpu_ioctl_run+0x213/0x5870 arch/x86/kvm/x86.c:7222
> RSP: 0018:8800627cf670 EFLAGS: 00010297
> RAX: 880066c44480 RBX: 88006b07f000 RCX: 880067d0845c
> RDX:  RSI:  RDI: 880067d08260
> RBP: 8800627cfa40 R08: 0001 R09: 
> R10: 8800627cfa58 R11:  R12: 0001
> R13:  R14: 88006b934880 R15: 880067d08040
>  kvm_vcpu_ioctl+0x64c/0x1010 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2592
>  vfs_ioctl fs/ioctl.c:45 [inline]
>  do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
>  SYSC_ioctl fs/ioctl.c:700 [inline]
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
>  entry_SYSCALL_64_fastpath+0x1f/0xbe
> RIP: 0033:0x44ccc9
> RSP: 002b:7f9ecbc37db8 EFLAGS: 0297 ORIG_RAX: 0010
> RAX: ffda RBX:  RCX: 0044ccc9
> RDX:  RSI: ae80 RDI: 0005
> RBP: 0082 R08: 7f9ecbc38700 R09: 
> R10: 7f9ecbc38700 R11: 0297 R12: 
> R13:  R14: 7f9ecbc389c0 R15: 7f9ecbc38700
>
>
>
> // autogenerated by syzkaller (http://github.com/google/syzkaller)
> #define _GNU_SOURCE
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
>
> int kvmcpu;
> struct kvm_run *run;
>
> void* thr(void* arg)
> {
>   int res;
>   res = ioctl(kvmcpu, KVM_RUN, 0);
>   printf("ret1=%d exit_reason=%d suberror=%d\n",
>   res, run->exit_reason, run->internal.suberror);
>   return 0;
> }
>
> void test()
> {
>   int i, kvm, kvmvm;
>   pthread_t th[4];
>
>   kvm = open("/dev/kvm", O_RDWR);
>   kvmvm = ioctl(kvm, KVM_CREATE_VM, 0);
>   kvmcpu = ioctl(kvmvm, KVM_CREATE_VCPU, 0);
>   run = (struct kvm_run*)mmap(0, 4096, PROT_READ|PROT_WRITE,
> MAP_SHARED, kvmcpu, 0);
>   srand(getpid());
>   for (i = 0; i < 4; i++) {
> pthread_create([i], 0, thr, 0);
> usleep(rand() % 1);
>   }
>   for (i = 0; i < 4; i++)
> pthread_join(th[i], 0);
> }
>
> int main()
> {
>   for (;;) {
> int pid = fork();
> if (pid < 0)
>   exit(1);
> if (pid == 0) {
>   test();
>   exit(0);
> }
> int status;
> while (waitpid(pid, , __WALL) != pid) {}
>   }
>   return 0;
> }


Re: kvm: WARNING in kvm_arch_vcpu_ioctl_run

2017-08-09 Thread Wanpeng Li
2017-08-10 1:07 GMT+08:00 Dmitry Vyukov :
> Hello,
>
> syzkaller fuzzer has hit the following WARNING in kvm_arch_vcpu_ioctl_run.
> This is easily reproducible and reproducer is attached at the bottom.
> The report is on upstream commit
> 26c5cebfdb6ca799186f1e56be7d6f2480c5012c. This requires setting
> kvm-intel.unrestricted_guest=0 on the machine, with
> unrestricted_guest=1 the WARNING does not happen. Output of the
> program is:
>
> ret1=0 exit_reason=17 suberror=1
> ret2=0 exit_reason=8 suberror=65530

Please have a try. https://lkml.org/lkml/2017/8/10/27

Regards,
Wanpeng Li

>
>
> WARNING: CPU: 1 PID: 2850 at arch/x86/kvm/x86.c:7223
> kvm_arch_vcpu_ioctl_run+0x213/0x5870 arch/x86/kvm/x86.c:7223
> CPU: 1 PID: 2850 Comm: a.out Not tainted 4.13.0-rc3+ #445
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:16 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:52
>  panic+0x1e4/0x417 kernel/panic.c:180
>  __warn+0x1c4/0x1d9 kernel/panic.c:541
>  report_bug+0x211/0x2d0 lib/bug.c:183
>  fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
>  do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
>  do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
>  do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
>  invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846
> RIP: 0010:kvm_arch_vcpu_ioctl_run+0x213/0x5870 arch/x86/kvm/x86.c:7222
> RSP: 0018:8800627cf670 EFLAGS: 00010297
> RAX: 880066c44480 RBX: 88006b07f000 RCX: 880067d0845c
> RDX:  RSI:  RDI: 880067d08260
> RBP: 8800627cfa40 R08: 0001 R09: 
> R10: 8800627cfa58 R11:  R12: 0001
> R13:  R14: 88006b934880 R15: 880067d08040
>  kvm_vcpu_ioctl+0x64c/0x1010 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2592
>  vfs_ioctl fs/ioctl.c:45 [inline]
>  do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
>  SYSC_ioctl fs/ioctl.c:700 [inline]
>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
>  entry_SYSCALL_64_fastpath+0x1f/0xbe
> RIP: 0033:0x44ccc9
> RSP: 002b:7f9ecbc37db8 EFLAGS: 0297 ORIG_RAX: 0010
> RAX: ffda RBX:  RCX: 0044ccc9
> RDX:  RSI: ae80 RDI: 0005
> RBP: 0082 R08: 7f9ecbc38700 R09: 
> R10: 7f9ecbc38700 R11: 0297 R12: 
> R13:  R14: 7f9ecbc389c0 R15: 7f9ecbc38700
>
>
>
> // autogenerated by syzkaller (http://github.com/google/syzkaller)
> #define _GNU_SOURCE
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
>
> int kvmcpu;
> struct kvm_run *run;
>
> void* thr(void* arg)
> {
>   int res;
>   res = ioctl(kvmcpu, KVM_RUN, 0);
>   printf("ret1=%d exit_reason=%d suberror=%d\n",
>   res, run->exit_reason, run->internal.suberror);
>   return 0;
> }
>
> void test()
> {
>   int i, kvm, kvmvm;
>   pthread_t th[4];
>
>   kvm = open("/dev/kvm", O_RDWR);
>   kvmvm = ioctl(kvm, KVM_CREATE_VM, 0);
>   kvmcpu = ioctl(kvmvm, KVM_CREATE_VCPU, 0);
>   run = (struct kvm_run*)mmap(0, 4096, PROT_READ|PROT_WRITE,
> MAP_SHARED, kvmcpu, 0);
>   srand(getpid());
>   for (i = 0; i < 4; i++) {
> pthread_create([i], 0, thr, 0);
> usleep(rand() % 1);
>   }
>   for (i = 0; i < 4; i++)
> pthread_join(th[i], 0);
> }
>
> int main()
> {
>   for (;;) {
> int pid = fork();
> if (pid < 0)
>   exit(1);
> if (pid == 0) {
>   test();
>   exit(0);
> }
> int status;
> while (waitpid(pid, , __WALL) != pid) {}
>   }
>   return 0;
> }


[PATCH] KVM: X86: Fix residual mmio emulation request to userspace

2017-08-09 Thread Wanpeng Li
Reported by syzkaller:

The kvm-intel.unrestricted_guest=0

   WARNING: CPU: 5 PID: 1014 at /home/kernel/data/kvm/arch/x86/kvm//x86.c:7227 
kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm]
   CPU: 5 PID: 1014 Comm: warn_test Tainted: GW  OE   4.13.0-rc3+ #8
   RIP: 0010:kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm]
   Call Trace:
? put_pid+0x3a/0x50
? rcu_read_lock_sched_held+0x79/0x80
? kmem_cache_free+0x2f2/0x350
kvm_vcpu_ioctl+0x340/0x700 [kvm]
? kvm_vcpu_ioctl+0x340/0x700 [kvm]
? __fget+0xfc/0x210
do_vfs_ioctl+0xa4/0x6a0
? __fget+0x11d/0x210
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x23/0xc2
? __this_cpu_preempt_check+0x13/0x20

The syszkaller folks reported a residual mmio emulation request to userspace 
due to vm86 fails to emulate inject real mode interrupt(fails to read CS) and 
incurs a triple fault. The vCPU returns to userspace with vcpu->mmio_needed == 
true 
and KVM_EXIT_SHUTDOWN exit reason. However, the syszkaller testcase constructs 
several threads to launch the same vCPU, the thread which lauch this vCPU after 
the thread whichs get the vcpu->mmio_needed == true and KVM_EXIT_SHUTDOWN will 
trigger the warning.

   #define _GNU_SOURCE
   #include 
   #include 
   #include 
   #include 
   #include 
   #include 
   #include 
   #include 
   #include 
   #include 
   #include 
   #include 
   
   int kvmcpu;
   struct kvm_run *run;
   
   void* thr(void* arg)
   {
 int res;
 res = ioctl(kvmcpu, KVM_RUN, 0);
 printf("ret1=%d exit_reason=%d suberror=%d\n",
 res, run->exit_reason, run->internal.suberror);
 return 0;
   }
   
   void test()
   {
 int i, kvm, kvmvm;
 pthread_t th[4];
   
 kvm = open("/dev/kvm", O_RDWR);
 kvmvm = ioctl(kvm, KVM_CREATE_VM, 0);
 kvmcpu = ioctl(kvmvm, KVM_CREATE_VCPU, 0);
 run = (struct kvm_run*)mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 
kvmcpu, 0);
 srand(getpid());
 for (i = 0; i < 4; i++) {
   pthread_create([i], 0, thr, 0);
   usleep(rand() % 1);
 }
 for (i = 0; i < 4; i++)
   pthread_join(th[i], 0);
   }
   
   int main()
   {
 for (;;) {
   int pid = fork();
   if (pid < 0)
 exit(1);
   if (pid == 0) {
 test();
 exit(0);
   }
   int status;
   while (waitpid(pid, , __WALL) != pid) {}
 }
 return 0;
   }

This patch fixes it by resetting the vcpu->mmio_needed once we receive 
the triple fault to avoid the residue.

Reported-by: Dmitry Vyukov 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Dmitry Vyukov 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/vmx.c | 1 +
 arch/x86/kvm/x86.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 8e4a2dc..77ab10b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5864,6 +5864,7 @@ static int handle_external_interrupt(struct kvm_vcpu 
*vcpu)
 static int handle_triple_fault(struct kvm_vcpu *vcpu)
 {
vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN;
+   vcpu->mmio_needed = 0;
return 0;
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 72d82ab..1e143f7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6776,6 +6776,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
}
if (kvm_check_request(KVM_REQ_TRIPLE_FAULT, vcpu)) {
vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN;
+   vcpu->mmio_needed = 0;
r = 0;
goto out;
}
-- 
2.7.4



[PATCH] KVM: X86: Fix residual mmio emulation request to userspace

2017-08-09 Thread Wanpeng Li
Reported by syzkaller:

The kvm-intel.unrestricted_guest=0

   WARNING: CPU: 5 PID: 1014 at /home/kernel/data/kvm/arch/x86/kvm//x86.c:7227 
kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm]
   CPU: 5 PID: 1014 Comm: warn_test Tainted: GW  OE   4.13.0-rc3+ #8
   RIP: 0010:kvm_arch_vcpu_ioctl_run+0x38b/0x1be0 [kvm]
   Call Trace:
? put_pid+0x3a/0x50
? rcu_read_lock_sched_held+0x79/0x80
? kmem_cache_free+0x2f2/0x350
kvm_vcpu_ioctl+0x340/0x700 [kvm]
? kvm_vcpu_ioctl+0x340/0x700 [kvm]
? __fget+0xfc/0x210
do_vfs_ioctl+0xa4/0x6a0
? __fget+0x11d/0x210
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x23/0xc2
? __this_cpu_preempt_check+0x13/0x20

The syszkaller folks reported a residual mmio emulation request to userspace 
due to vm86 fails to emulate inject real mode interrupt(fails to read CS) and 
incurs a triple fault. The vCPU returns to userspace with vcpu->mmio_needed == 
true 
and KVM_EXIT_SHUTDOWN exit reason. However, the syszkaller testcase constructs 
several threads to launch the same vCPU, the thread which lauch this vCPU after 
the thread whichs get the vcpu->mmio_needed == true and KVM_EXIT_SHUTDOWN will 
trigger the warning.

   #define _GNU_SOURCE
   #include 
   #include 
   #include 
   #include 
   #include 
   #include 
   #include 
   #include 
   #include 
   #include 
   #include 
   #include 
   
   int kvmcpu;
   struct kvm_run *run;
   
   void* thr(void* arg)
   {
 int res;
 res = ioctl(kvmcpu, KVM_RUN, 0);
 printf("ret1=%d exit_reason=%d suberror=%d\n",
 res, run->exit_reason, run->internal.suberror);
 return 0;
   }
   
   void test()
   {
 int i, kvm, kvmvm;
 pthread_t th[4];
   
 kvm = open("/dev/kvm", O_RDWR);
 kvmvm = ioctl(kvm, KVM_CREATE_VM, 0);
 kvmcpu = ioctl(kvmvm, KVM_CREATE_VCPU, 0);
 run = (struct kvm_run*)mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 
kvmcpu, 0);
 srand(getpid());
 for (i = 0; i < 4; i++) {
   pthread_create([i], 0, thr, 0);
   usleep(rand() % 1);
 }
 for (i = 0; i < 4; i++)
   pthread_join(th[i], 0);
   }
   
   int main()
   {
 for (;;) {
   int pid = fork();
   if (pid < 0)
 exit(1);
   if (pid == 0) {
 test();
 exit(0);
   }
   int status;
   while (waitpid(pid, , __WALL) != pid) {}
 }
 return 0;
   }

This patch fixes it by resetting the vcpu->mmio_needed once we receive 
the triple fault to avoid the residue.

Reported-by: Dmitry Vyukov 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Dmitry Vyukov 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/vmx.c | 1 +
 arch/x86/kvm/x86.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 8e4a2dc..77ab10b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5864,6 +5864,7 @@ static int handle_external_interrupt(struct kvm_vcpu 
*vcpu)
 static int handle_triple_fault(struct kvm_vcpu *vcpu)
 {
vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN;
+   vcpu->mmio_needed = 0;
return 0;
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 72d82ab..1e143f7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6776,6 +6776,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
}
if (kvm_check_request(KVM_REQ_TRIPLE_FAULT, vcpu)) {
vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN;
+   vcpu->mmio_needed = 0;
r = 0;
goto out;
}
-- 
2.7.4



Re: [RFC v1 4/4] ipmi_bmc: bt-aspeed: port driver to IPMI BMC framework

2017-08-09 Thread Brendan Higgins
On Wed, Aug 9, 2017 at 7:31 PM, Jeremy Kerr  wrote:
> Hi Brendan,
>
>> The driver was handling interaction with userspace on its own. This
>> patch changes it to use the functionality of the ipmi_bmc framework
>> instead.
>>
>> Note that this removes the ability for the BMC to set SMS_ATN by making
>> an ioctl. If this functionality is required, it can be added back in
>> with a later patch.
>
> As Chris has mentioned, we do use this actively at the moment, so I'd
> prefer if we could not drop the support for SMS_ATN. However, using a
> different interface should be fine, if that helps.

Whoops, I did not realize that anyone was using it. Yeah, adding it back in
should not be too hard.

>
> Cheers,
>
>
> Jeremy


Re: [RFC v1 4/4] ipmi_bmc: bt-aspeed: port driver to IPMI BMC framework

2017-08-09 Thread Brendan Higgins
On Wed, Aug 9, 2017 at 7:31 PM, Jeremy Kerr  wrote:
> Hi Brendan,
>
>> The driver was handling interaction with userspace on its own. This
>> patch changes it to use the functionality of the ipmi_bmc framework
>> instead.
>>
>> Note that this removes the ability for the BMC to set SMS_ATN by making
>> an ioctl. If this functionality is required, it can be added back in
>> with a later patch.
>
> As Chris has mentioned, we do use this actively at the moment, so I'd
> prefer if we could not drop the support for SMS_ATN. However, using a
> different interface should be fine, if that helps.

Whoops, I did not realize that anyone was using it. Yeah, adding it back in
should not be too hard.

>
> Cheers,
>
>
> Jeremy


Re: [PATCH 1/7] mtd: spi-nor: cadence-quadspi: add a delay in write sequence

2017-08-09 Thread Vignesh R


On Thursday 10 August 2017 05:38 AM, Rob Herring wrote:
> On Tue, Aug 01, 2017 at 10:24:28AM +0530, Vignesh R wrote:
>> As per 66AK2G02 TRM[1] SPRUHY8F section 11.15.5.3 Indirect Access
>> Controller programming sequence, a delay equal to couple QSPI master
>> clock(~5ns) is required after setting CQSPI_REG_INDIRECTWR_START bit and
>> writing data to the flash. Add a new compatible to handle the couple of
>> cycles of delay required in the indirect write sequence, since this
>> delay is specific to TI 66AK2G SoC.
>>
>> [1]http://www.ti.com/lit/ug/spruhy8f/spruhy8f.pdf
>>
>> Signed-off-by: Vignesh R 
>> ---
>>  Documentation/devicetree/bindings/mtd/cadence-quadspi.txt |  1 +
>>  drivers/mtd/spi-nor/cadence-quadspi.c | 13 +
>>  2 files changed, 14 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/mtd/cadence-quadspi.txt 
>> b/Documentation/devicetree/bindings/mtd/cadence-quadspi.txt
>> index f248056da24c..fdd511a83511 100644
>> --- a/Documentation/devicetree/bindings/mtd/cadence-quadspi.txt
>> +++ b/Documentation/devicetree/bindings/mtd/cadence-quadspi.txt
>> @@ -2,6 +2,7 @@
>>  
>>  Required properties:
>>  - compatible : Should be "cdns,qspi-nor".
>> +   Should be "ti,k2g-qspi" for TI 66AK2G platform.
> 
> Also, this doesn't indicate that "cdns,qspi-nor" is a fallback as you 
> have in the dts files. Reformat to 1 valid combination per line.
> 

Agreed, will fix it in v2.

-- 
Regards
Vignesh


Re: [PATCH 1/7] mtd: spi-nor: cadence-quadspi: add a delay in write sequence

2017-08-09 Thread Vignesh R


On Thursday 10 August 2017 05:38 AM, Rob Herring wrote:
> On Tue, Aug 01, 2017 at 10:24:28AM +0530, Vignesh R wrote:
>> As per 66AK2G02 TRM[1] SPRUHY8F section 11.15.5.3 Indirect Access
>> Controller programming sequence, a delay equal to couple QSPI master
>> clock(~5ns) is required after setting CQSPI_REG_INDIRECTWR_START bit and
>> writing data to the flash. Add a new compatible to handle the couple of
>> cycles of delay required in the indirect write sequence, since this
>> delay is specific to TI 66AK2G SoC.
>>
>> [1]http://www.ti.com/lit/ug/spruhy8f/spruhy8f.pdf
>>
>> Signed-off-by: Vignesh R 
>> ---
>>  Documentation/devicetree/bindings/mtd/cadence-quadspi.txt |  1 +
>>  drivers/mtd/spi-nor/cadence-quadspi.c | 13 +
>>  2 files changed, 14 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/mtd/cadence-quadspi.txt 
>> b/Documentation/devicetree/bindings/mtd/cadence-quadspi.txt
>> index f248056da24c..fdd511a83511 100644
>> --- a/Documentation/devicetree/bindings/mtd/cadence-quadspi.txt
>> +++ b/Documentation/devicetree/bindings/mtd/cadence-quadspi.txt
>> @@ -2,6 +2,7 @@
>>  
>>  Required properties:
>>  - compatible : Should be "cdns,qspi-nor".
>> +   Should be "ti,k2g-qspi" for TI 66AK2G platform.
> 
> Also, this doesn't indicate that "cdns,qspi-nor" is a fallback as you 
> have in the dts files. Reformat to 1 valid combination per line.
> 

Agreed, will fix it in v2.

-- 
Regards
Vignesh


Re: [PATCH v2 0/4] ipmi: bt-i2c: added IPMI Block Transfer over I2C

2017-08-09 Thread Brendan Higgins
On Wed, Aug 9, 2017 at 7:26 PM, Corey Minyard  wrote:
> On 08/09/2017 08:04 PM, Brendan Higgins wrote:
>>>
>>> Perhaps that is some level of abuse, but it's pretty common.  I'm not
>>> against it.
>>>
>>> There is standard IPMI firmware NetFN (though no commands defined) that
>>> if
>>> you use
>>> the driver automatically goes into "Maintenance mode" and modified the
>>> timeouts
>>> and handling to some extent to help with this.
>>
>> That is a really good point, I missed that.
>> ...
>>>
>>>
>>> There are ways to accomplish this that aren't that complex.  You can
>>> create
>>> an OEM
>>> command that can query the maximum message size and the ability to do
>>> sequence
>>> numbers in the messages.
>>>
>>> If messages larger than 32-bytes are supported, and the host I2C/SMBus
>>> driver
>>> supports it, you could use the standard SSIF SMBus commands to do this,
>>> they
>>> have an 8-bit length field.
>>>
>>> If sequence numbers are supported, The SSIF could use different SMBus
>>> commands
>>> to do the write and read requests.  Since this is only if you get an OEM
>>> command,
>>> and if you put the sequence numbers at the end where they are easy to add
>>> on
>>> the send side, this is a small change to the driver.
>>
>> What if we just had an OEM command that changed the message structure from
>> that point on? We could abuse the "maintenance mode" NetFN to get back
>> into
>> normal SSIF if necessary.
>
>
> Actually, I wouldn't have a separate "openbmc mode".  I would have OpenBMC
> always
> work with standard SSIF, and have separate SMBus commands for messages with
> the sequence number and messages larger than 32 bytes.
>
> I've attached a patch with what I would expect the changes to be to the host
> driver.
> It doesn't handle multiple outstanding messages, but it shows what detection
> and a
> separate SMBus command would look like.

I took a look at the patch, it seems reasonable. If I was maintaining
SSIF, I probably
would not want that kind of clutter for my admittedly weird use case,
but if you're
okay with it, then so am I.

>
>
>>> So I think the changes would be small and contained.  I'm actually ok
>>> with a
>>> different driver, but I think it would be more valuable to the OpenBMC
>>> project
>>> to have a standardized interface that would work (in a not quite as
>>> efficient
>>> mode) with software that does not use the Linux IPMI driver.
>>
>> I guess I see the all of my asks as hacky things which we can hopefully
>> remove
>> at some point. Hopefully, most OpenBMC users won't want or need these
>> things.
>> ...

 Regardless of what we do with the "BT-I2C" stuff, I am still interested
 in
 what
 you think about this.
>>>
>>>
>>> I think you are right, it probably belongs some place else.  The way that
>>> makes the most
>>> sense to me would be to have an "ipmi" directory with a "host" and
>>> "slave"
>>> side, and since
>>> ipmi is not really a char driver, to move it to the main driver
>>> directory.
>>> That might be
>>> fairly disruptive, though.
>>
>> That was my thinking exactly.
>>
>>> The other option that makes sense to me would be to add a
>>> drivers/char/ipmi_slave directory,
>>> or something like that, and put the slave code there.  That would be less
>>> disruptive.
>>
>> Right that is the approach I took, except I called it
>> drivers/char/ipmi_bmc.
>>
>> I originally thought doing the less disruptive thing is best; however, I
>> know
>> there are also some OpenBMC people who are interested in implementing
>> IPMB. So maybe now is the time to bite the bullet and create an ipmi
>> directory under drivers/.
>
>
> I'm not sure IPMB would make much difference, there's no host side change as
> it's
> already supported.  I don't think there would be any significant code
> sharing
> between the two.

No, I don't expect much code sharing between them. I just thought it would be a
reasonable place to put IPMB, sort of like how we have a bunch of "character"
device drivers in drivers/char, but I suppose that might be somewhat of an
anti-pattern ;-)

>
> If there end up being a significant amount of common code, then it would
> definitely be worth the effort to move it.
>
>>> -corey
>>
>> In summary, I think I can live with making it a mangled form of SSIF, but
>> I would prefer to put it in its own driver.
>
>
> You can look at the patch and consider it, and consider that you would need
> to
> implement flag and event handling.  On an x86 host there would be SMBIOS
> and ACPI stuff to deal with somehow for discovery.  There's probably few
> other
> things to deal with.
>
>> In any case, I think I would rather focus on the the BMC side IPMI
>> framework
>> now, since it is a bigger change and would also reduce the work of
>> implementing a BMC side SSIF driver.
>>
>> Here is what I propose: we focus on the BMC side IPMI framework RFC that
>> I sent out the other day:
>> 

Re: [PATCH v2 0/4] ipmi: bt-i2c: added IPMI Block Transfer over I2C

2017-08-09 Thread Brendan Higgins
On Wed, Aug 9, 2017 at 7:26 PM, Corey Minyard  wrote:
> On 08/09/2017 08:04 PM, Brendan Higgins wrote:
>>>
>>> Perhaps that is some level of abuse, but it's pretty common.  I'm not
>>> against it.
>>>
>>> There is standard IPMI firmware NetFN (though no commands defined) that
>>> if
>>> you use
>>> the driver automatically goes into "Maintenance mode" and modified the
>>> timeouts
>>> and handling to some extent to help with this.
>>
>> That is a really good point, I missed that.
>> ...
>>>
>>>
>>> There are ways to accomplish this that aren't that complex.  You can
>>> create
>>> an OEM
>>> command that can query the maximum message size and the ability to do
>>> sequence
>>> numbers in the messages.
>>>
>>> If messages larger than 32-bytes are supported, and the host I2C/SMBus
>>> driver
>>> supports it, you could use the standard SSIF SMBus commands to do this,
>>> they
>>> have an 8-bit length field.
>>>
>>> If sequence numbers are supported, The SSIF could use different SMBus
>>> commands
>>> to do the write and read requests.  Since this is only if you get an OEM
>>> command,
>>> and if you put the sequence numbers at the end where they are easy to add
>>> on
>>> the send side, this is a small change to the driver.
>>
>> What if we just had an OEM command that changed the message structure from
>> that point on? We could abuse the "maintenance mode" NetFN to get back
>> into
>> normal SSIF if necessary.
>
>
> Actually, I wouldn't have a separate "openbmc mode".  I would have OpenBMC
> always
> work with standard SSIF, and have separate SMBus commands for messages with
> the sequence number and messages larger than 32 bytes.
>
> I've attached a patch with what I would expect the changes to be to the host
> driver.
> It doesn't handle multiple outstanding messages, but it shows what detection
> and a
> separate SMBus command would look like.

I took a look at the patch, it seems reasonable. If I was maintaining
SSIF, I probably
would not want that kind of clutter for my admittedly weird use case,
but if you're
okay with it, then so am I.

>
>
>>> So I think the changes would be small and contained.  I'm actually ok
>>> with a
>>> different driver, but I think it would be more valuable to the OpenBMC
>>> project
>>> to have a standardized interface that would work (in a not quite as
>>> efficient
>>> mode) with software that does not use the Linux IPMI driver.
>>
>> I guess I see the all of my asks as hacky things which we can hopefully
>> remove
>> at some point. Hopefully, most OpenBMC users won't want or need these
>> things.
>> ...

 Regardless of what we do with the "BT-I2C" stuff, I am still interested
 in
 what
 you think about this.
>>>
>>>
>>> I think you are right, it probably belongs some place else.  The way that
>>> makes the most
>>> sense to me would be to have an "ipmi" directory with a "host" and
>>> "slave"
>>> side, and since
>>> ipmi is not really a char driver, to move it to the main driver
>>> directory.
>>> That might be
>>> fairly disruptive, though.
>>
>> That was my thinking exactly.
>>
>>> The other option that makes sense to me would be to add a
>>> drivers/char/ipmi_slave directory,
>>> or something like that, and put the slave code there.  That would be less
>>> disruptive.
>>
>> Right that is the approach I took, except I called it
>> drivers/char/ipmi_bmc.
>>
>> I originally thought doing the less disruptive thing is best; however, I
>> know
>> there are also some OpenBMC people who are interested in implementing
>> IPMB. So maybe now is the time to bite the bullet and create an ipmi
>> directory under drivers/.
>
>
> I'm not sure IPMB would make much difference, there's no host side change as
> it's
> already supported.  I don't think there would be any significant code
> sharing
> between the two.

No, I don't expect much code sharing between them. I just thought it would be a
reasonable place to put IPMB, sort of like how we have a bunch of "character"
device drivers in drivers/char, but I suppose that might be somewhat of an
anti-pattern ;-)

>
> If there end up being a significant amount of common code, then it would
> definitely be worth the effort to move it.
>
>>> -corey
>>
>> In summary, I think I can live with making it a mangled form of SSIF, but
>> I would prefer to put it in its own driver.
>
>
> You can look at the patch and consider it, and consider that you would need
> to
> implement flag and event handling.  On an x86 host there would be SMBIOS
> and ACPI stuff to deal with somehow for discovery.  There's probably few
> other
> things to deal with.
>
>> In any case, I think I would rather focus on the the BMC side IPMI
>> framework
>> now, since it is a bigger change and would also reduce the work of
>> implementing a BMC side SSIF driver.
>>
>> Here is what I propose: we focus on the BMC side IPMI framework RFC that
>> I sent out the other day:
>> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1463473.html
>> I will 

[GIT] Sparc

2017-08-09 Thread David Miller

Please pull to get these sparc changes:

1) Recognize M8 cpus, just basic chip ID matching, from Allen Pais.

2) Prevent crashes when bringing up sunvdc virtual block devices in
   some environments.  From Jim Quigley.

Thanks!

The following changes since commit 0a23ea65ce9f10ec2ea392571006b781b150327f:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc (2017-08-04 
10:17:45 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc.git 

for you to fetch changes up to 3ee70591d6c47ef4c4699b3395ba96ce287db937:

  sunvdc: prevent sunvdc panic when mpgroup disk added to guest domain 
(2017-08-09 22:22:32 -0700)


Allen Pais (2):
  sparc64: properly name the cpu constants
  sparc64: recognize and support sparc M8 cpu type

Jim Quigley (1):
  sunvdc: prevent sunvdc panic when mpgroup disk added to guest domain

Vijay Kumar (1):
  sparc64: Increase max_phys_bits to 51 and VA bits to 53 for M8.

 arch/sparc/include/asm/spitfire.h | 16 
 arch/sparc/kernel/cpu.c   |  6 ++
 arch/sparc/kernel/cpumap.c|  1 +
 arch/sparc/kernel/head_64.S   | 22 ++
 arch/sparc/kernel/setup_64.c  | 15 +--
 arch/sparc/mm/init_64.c   | 14 +-
 drivers/block/sunvdc.c| 61 
+
 7 files changed, 124 insertions(+), 11 deletions(-)


[GIT] Sparc

2017-08-09 Thread David Miller

Please pull to get these sparc changes:

1) Recognize M8 cpus, just basic chip ID matching, from Allen Pais.

2) Prevent crashes when bringing up sunvdc virtual block devices in
   some environments.  From Jim Quigley.

Thanks!

The following changes since commit 0a23ea65ce9f10ec2ea392571006b781b150327f:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc (2017-08-04 
10:17:45 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc.git 

for you to fetch changes up to 3ee70591d6c47ef4c4699b3395ba96ce287db937:

  sunvdc: prevent sunvdc panic when mpgroup disk added to guest domain 
(2017-08-09 22:22:32 -0700)


Allen Pais (2):
  sparc64: properly name the cpu constants
  sparc64: recognize and support sparc M8 cpu type

Jim Quigley (1):
  sunvdc: prevent sunvdc panic when mpgroup disk added to guest domain

Vijay Kumar (1):
  sparc64: Increase max_phys_bits to 51 and VA bits to 53 for M8.

 arch/sparc/include/asm/spitfire.h | 16 
 arch/sparc/kernel/cpu.c   |  6 ++
 arch/sparc/kernel/cpumap.c|  1 +
 arch/sparc/kernel/head_64.S   | 22 ++
 arch/sparc/kernel/setup_64.c  | 15 +--
 arch/sparc/mm/init_64.c   | 14 +-
 drivers/block/sunvdc.c| 61 
+
 7 files changed, 124 insertions(+), 11 deletions(-)


Re: [PATCH 2/7] mtd: spi-nor: cadence-quadspi: Add support to enable loopback clock circuit

2017-08-09 Thread Vignesh R


On Thursday 10 August 2017 05:40 AM, Rob Herring wrote:
> On Tue, Aug 01, 2017 at 10:24:29AM +0530, Vignesh R wrote:
>> Cadence QSPI IP has a adapted loopback circuit which can be enabled by
>> setting BYPASS field to 0 in READCAPTURE register. It enables use of
>> QSPI return clock to latch the data rather than the internal QSPI
>> reference clock. For high speed operations, adapted loopback circuit
>> using QSPI return clock helps to increase data valid window.
>>
>> Add DT parameter cdns,rclk-en to help enable adapted loopback circuit
>> for boards which do have QSPI return clock provided.
>> This patch also modifies cqspi_readdata_capture() function's bypass
>> parameter to bool to match how its used in the function.
>>
>> Signed-off-by: Vignesh R 
>> ---
>>  Documentation/devicetree/bindings/mtd/cadence-quadspi.txt | 3 +++
>>  drivers/mtd/spi-nor/cadence-quadspi.c | 8 ++--
>>  2 files changed, 9 insertions(+), 2 deletions(-)
> 
> Please separate bindings to a separate patch or patches.
> 
Ok, Will do that in v2.


-- 
Regards
Vignesh


Re: [PATCH 2/7] mtd: spi-nor: cadence-quadspi: Add support to enable loopback clock circuit

2017-08-09 Thread Vignesh R


On Thursday 10 August 2017 05:40 AM, Rob Herring wrote:
> On Tue, Aug 01, 2017 at 10:24:29AM +0530, Vignesh R wrote:
>> Cadence QSPI IP has a adapted loopback circuit which can be enabled by
>> setting BYPASS field to 0 in READCAPTURE register. It enables use of
>> QSPI return clock to latch the data rather than the internal QSPI
>> reference clock. For high speed operations, adapted loopback circuit
>> using QSPI return clock helps to increase data valid window.
>>
>> Add DT parameter cdns,rclk-en to help enable adapted loopback circuit
>> for boards which do have QSPI return clock provided.
>> This patch also modifies cqspi_readdata_capture() function's bypass
>> parameter to bool to match how its used in the function.
>>
>> Signed-off-by: Vignesh R 
>> ---
>>  Documentation/devicetree/bindings/mtd/cadence-quadspi.txt | 3 +++
>>  drivers/mtd/spi-nor/cadence-quadspi.c | 8 ++--
>>  2 files changed, 9 insertions(+), 2 deletions(-)
> 
> Please separate bindings to a separate patch or patches.
> 
Ok, Will do that in v2.


-- 
Regards
Vignesh


Re: [PATCH 1/7] mtd: spi-nor: cadence-quadspi: add a delay in write sequence

2017-08-09 Thread Vignesh R


On Thursday 10 August 2017 05:35 AM, Rob Herring wrote:
> On Tue, Aug 01, 2017 at 10:24:28AM +0530, Vignesh R wrote:
>> As per 66AK2G02 TRM[1] SPRUHY8F section 11.15.5.3 Indirect Access
>> Controller programming sequence, a delay equal to couple QSPI master
>> clock(~5ns) is required after setting CQSPI_REG_INDIRECTWR_START bit and
>> writing data to the flash. Add a new compatible to handle the couple of
>> cycles of delay required in the indirect write sequence, since this
>> delay is specific to TI 66AK2G SoC.
>>
>> [1]http://www.ti.com/lit/ug/spruhy8f/spruhy8f.pdf
>>
[...]
>> +/*
>> + * As per 66AK2G02 TRM SPRUHY8F section 11.15.5.3 Indirect Access
>> + * Controller programming sequence, couple of cycles of
>> + * QSPI_REF_CLK delay is required for the above bit to
>> + * be internally synchronized by the QSPI module. Provide 5
>> + * cycles of delay.
>> + */
>> +ndelay(cqspi->wr_delay);
>>  
>>  while (remaining > 0) {
>>  write_bytes = remaining > page_size ? page_size : remaining;
>> @@ -1213,6 +1222,9 @@ static int cqspi_probe(struct platform_device *pdev)
>>  }
>>  
>>  cqspi->master_ref_clk_hz = clk_get_rate(cqspi->clk);
>> +if (of_device_is_compatible(dev->of_node, "ti,k2g-qspi"))
>> +cqspi->wr_delay = 5 * DIV_ROUND_UP(NSEC_PER_SEC,
>> +   cqspi->master_ref_clk_hz);
> 
> Use the data pointer in the of_device_id table and put the delay value 
> there.
> 

I thought about this, but for a given SoC, delay value might vary
depending on what frequency QSPI master_ref_clk is set to. So hard
coding will not help. How about having a flag (CQSPI_NEEDS_WR_DELAY) in
data pointer and then using that to determine whether or not to
calculate and set delay here?

>>  
>>  ret = devm_request_irq(dev, irq, cqspi_irq_handler, 0,
>> pdev->name, cqspi);
>> @@ -1285,6 +1297,7 @@ static const struct dev_pm_ops cqspi__dev_pm_ops = {
>>  
>>  static const struct of_device_id cqspi_dt_ids[] = {
>>  {.compatible = "cdns,qspi-nor",},
>> +{.compatible = "ti,k2g-qspi",},
>>  { /* end of table */ }
>>  };
>>  
>> -- 
>> 2.13.3
>>

-- 
Regards
Vignesh


Re: [PATCH 1/7] mtd: spi-nor: cadence-quadspi: add a delay in write sequence

2017-08-09 Thread Vignesh R


On Thursday 10 August 2017 05:35 AM, Rob Herring wrote:
> On Tue, Aug 01, 2017 at 10:24:28AM +0530, Vignesh R wrote:
>> As per 66AK2G02 TRM[1] SPRUHY8F section 11.15.5.3 Indirect Access
>> Controller programming sequence, a delay equal to couple QSPI master
>> clock(~5ns) is required after setting CQSPI_REG_INDIRECTWR_START bit and
>> writing data to the flash. Add a new compatible to handle the couple of
>> cycles of delay required in the indirect write sequence, since this
>> delay is specific to TI 66AK2G SoC.
>>
>> [1]http://www.ti.com/lit/ug/spruhy8f/spruhy8f.pdf
>>
[...]
>> +/*
>> + * As per 66AK2G02 TRM SPRUHY8F section 11.15.5.3 Indirect Access
>> + * Controller programming sequence, couple of cycles of
>> + * QSPI_REF_CLK delay is required for the above bit to
>> + * be internally synchronized by the QSPI module. Provide 5
>> + * cycles of delay.
>> + */
>> +ndelay(cqspi->wr_delay);
>>  
>>  while (remaining > 0) {
>>  write_bytes = remaining > page_size ? page_size : remaining;
>> @@ -1213,6 +1222,9 @@ static int cqspi_probe(struct platform_device *pdev)
>>  }
>>  
>>  cqspi->master_ref_clk_hz = clk_get_rate(cqspi->clk);
>> +if (of_device_is_compatible(dev->of_node, "ti,k2g-qspi"))
>> +cqspi->wr_delay = 5 * DIV_ROUND_UP(NSEC_PER_SEC,
>> +   cqspi->master_ref_clk_hz);
> 
> Use the data pointer in the of_device_id table and put the delay value 
> there.
> 

I thought about this, but for a given SoC, delay value might vary
depending on what frequency QSPI master_ref_clk is set to. So hard
coding will not help. How about having a flag (CQSPI_NEEDS_WR_DELAY) in
data pointer and then using that to determine whether or not to
calculate and set delay here?

>>  
>>  ret = devm_request_irq(dev, irq, cqspi_irq_handler, 0,
>> pdev->name, cqspi);
>> @@ -1285,6 +1297,7 @@ static const struct dev_pm_ops cqspi__dev_pm_ops = {
>>  
>>  static const struct of_device_id cqspi_dt_ids[] = {
>>  {.compatible = "cdns,qspi-nor",},
>> +{.compatible = "ti,k2g-qspi",},
>>  { /* end of table */ }
>>  };
>>  
>> -- 
>> 2.13.3
>>

-- 
Regards
Vignesh


[PATCH v2] memory: mtk-smi: Handle return value of clk_prepare_enable

2017-08-09 Thread Arvind Yadav
clk_prepare_enable() can fail here and we must check its return value.

Signed-off-by: Arvind Yadav 
---
changes in v2:
 Rebase patch[1]https://lkml.org/lkml/2017/8/3/968
 and apply this change. Otherwise will merge conflict.

 drivers/memory/mtk-smi.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
index 2b798bb..583fb8d 100644
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -315,6 +315,7 @@ static int mtk_smi_common_probe(struct platform_device 
*pdev)
struct mtk_smi *common;
struct resource *res;
enum mtk_smi_gen smi_gen;
+   int ret;
 
if (!dev->pm_domain)
return -EPROBE_DEFER;
@@ -349,7 +350,9 @@ static int mtk_smi_common_probe(struct platform_device 
*pdev)
if (IS_ERR(common->clk_async))
return PTR_ERR(common->clk_async);
 
-   clk_prepare_enable(common->clk_async);
+   ret = clk_prepare_enable(common->clk_async);
+   if (ret)
+   return ret;
}
pm_runtime_enable(dev);
platform_set_drvdata(pdev, common);
-- 
1.9.1



[PATCH v2] memory: mtk-smi: Handle return value of clk_prepare_enable

2017-08-09 Thread Arvind Yadav
clk_prepare_enable() can fail here and we must check its return value.

Signed-off-by: Arvind Yadav 
---
changes in v2:
 Rebase patch[1]https://lkml.org/lkml/2017/8/3/968
 and apply this change. Otherwise will merge conflict.

 drivers/memory/mtk-smi.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
index 2b798bb..583fb8d 100644
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -315,6 +315,7 @@ static int mtk_smi_common_probe(struct platform_device 
*pdev)
struct mtk_smi *common;
struct resource *res;
enum mtk_smi_gen smi_gen;
+   int ret;
 
if (!dev->pm_domain)
return -EPROBE_DEFER;
@@ -349,7 +350,9 @@ static int mtk_smi_common_probe(struct platform_device 
*pdev)
if (IS_ERR(common->clk_async))
return PTR_ERR(common->clk_async);
 
-   clk_prepare_enable(common->clk_async);
+   ret = clk_prepare_enable(common->clk_async);
+   if (ret)
+   return ret;
}
pm_runtime_enable(dev);
platform_set_drvdata(pdev, common);
-- 
1.9.1



Re: [PATCH 3/3] ACPI / scan: Enable GPEs before scanning the namespace

2017-08-09 Thread Lukas Wunner
On Thu, Aug 10, 2017 at 12:34:23AM +0200, Rafael J. Wysocki wrote:
> --- linux-pm.orig/drivers/acpi/scan.c
> +++ linux-pm/drivers/acpi/scan.c
> @@ -2139,6 +2139,10 @@ int __init acpi_scan_init(void)
>   acpi_get_spcr_uart_addr();
>   }
>  
> + acpi_gpe_apply_masked_gpes();
> + acpi_update_all_gpes();
> + acpi_ec_ecdt_start();
> +
>   mutex_lock(_scan_lock);
>   /*
>* Enumerate devices in the ACPI namespace.

I notice this is called from a subsys_initcall().  We scan the PCI bus
much earlier in arch/x86/kernel/early-quirks.c and it would be possible
to identify presence of Thunderbolt host controllers in an early quirk
(using the method of pci_is_thunderbolt_attached()) and, if found,
enable their GPEs or all GPEs.

Just as an aside in case your method doesn't work, I'm not affected by
this issue being a Mac user... ;-)

Thanks,

Lukas


Re: [PATCH 3/3] ACPI / scan: Enable GPEs before scanning the namespace

2017-08-09 Thread Lukas Wunner
On Thu, Aug 10, 2017 at 12:34:23AM +0200, Rafael J. Wysocki wrote:
> --- linux-pm.orig/drivers/acpi/scan.c
> +++ linux-pm/drivers/acpi/scan.c
> @@ -2139,6 +2139,10 @@ int __init acpi_scan_init(void)
>   acpi_get_spcr_uart_addr();
>   }
>  
> + acpi_gpe_apply_masked_gpes();
> + acpi_update_all_gpes();
> + acpi_ec_ecdt_start();
> +
>   mutex_lock(_scan_lock);
>   /*
>* Enumerate devices in the ACPI namespace.

I notice this is called from a subsys_initcall().  We scan the PCI bus
much earlier in arch/x86/kernel/early-quirks.c and it would be possible
to identify presence of Thunderbolt host controllers in an early quirk
(using the method of pci_is_thunderbolt_attached()) and, if found,
enable their GPEs or all GPEs.

Just as an aside in case your method doesn't work, I'm not affected by
this issue being a Mac user... ;-)

Thanks,

Lukas


Re: [PATCH] cpufreq: x86: Disable interrupts during MSRs reading

2017-08-09 Thread Len Brown
thanks, Doug!

Rafael,

Reviewed-by: Len Brown 


On Tue, Aug 8, 2017 at 5:12 PM, Doug Smythies  wrote:
> According to Intel 64 and IA-32 Architectures SDM, Volume 3,
> Chapter 14.2, "Software needs to exercise care to avoid delays
> between the two RDMSRs (for example interrupts)".
>
> So, disable interrupts during reading MSRs IA32_APERF and IA32_MPERF.
>
> See also:
> commit 4ab60c3f32c721e46217e762bcd3e55a8f659c04
> cpufreq: intel_pstate: Disable interrupts during MSRs reading
>
> Signed-off-by: Doug Smythies 
> ---
>  arch/x86/kernel/cpu/aperfmperf.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/aperfmperf.c 
> b/arch/x86/kernel/cpu/aperfmperf.c
> index 7cf7c70..0ee8332 100644
> --- a/arch/x86/kernel/cpu/aperfmperf.c
> +++ b/arch/x86/kernel/cpu/aperfmperf.c
> @@ -40,13 +40,16 @@ static void aperfmperf_snapshot_khz(void *dummy)
> struct aperfmperf_sample *s = this_cpu_ptr();
> ktime_t now = ktime_get();
> s64 time_delta = ktime_ms_delta(now, s->time);
> +   unsigned long flags;
>
> /* Don't bother re-computing within the cache threshold time. */
> if (time_delta < APERFMPERF_CACHE_THRESHOLD_MS)
> return;
>
> +   local_irq_save(flags);
> rdmsrl(MSR_IA32_APERF, aperf);
> rdmsrl(MSR_IA32_MPERF, mperf);
> +   local_irq_restore(flags);
>
> aperf_delta = aperf - s->aperf;
> mperf_delta = mperf - s->mperf;
> --
> 2.7.4
>



-- 
Len Brown, Intel Open Source Technology Center


Re: [PATCH] cpufreq: x86: Disable interrupts during MSRs reading

2017-08-09 Thread Len Brown
thanks, Doug!

Rafael,

Reviewed-by: Len Brown 


On Tue, Aug 8, 2017 at 5:12 PM, Doug Smythies  wrote:
> According to Intel 64 and IA-32 Architectures SDM, Volume 3,
> Chapter 14.2, "Software needs to exercise care to avoid delays
> between the two RDMSRs (for example interrupts)".
>
> So, disable interrupts during reading MSRs IA32_APERF and IA32_MPERF.
>
> See also:
> commit 4ab60c3f32c721e46217e762bcd3e55a8f659c04
> cpufreq: intel_pstate: Disable interrupts during MSRs reading
>
> Signed-off-by: Doug Smythies 
> ---
>  arch/x86/kernel/cpu/aperfmperf.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/aperfmperf.c 
> b/arch/x86/kernel/cpu/aperfmperf.c
> index 7cf7c70..0ee8332 100644
> --- a/arch/x86/kernel/cpu/aperfmperf.c
> +++ b/arch/x86/kernel/cpu/aperfmperf.c
> @@ -40,13 +40,16 @@ static void aperfmperf_snapshot_khz(void *dummy)
> struct aperfmperf_sample *s = this_cpu_ptr();
> ktime_t now = ktime_get();
> s64 time_delta = ktime_ms_delta(now, s->time);
> +   unsigned long flags;
>
> /* Don't bother re-computing within the cache threshold time. */
> if (time_delta < APERFMPERF_CACHE_THRESHOLD_MS)
> return;
>
> +   local_irq_save(flags);
> rdmsrl(MSR_IA32_APERF, aperf);
> rdmsrl(MSR_IA32_MPERF, mperf);
> +   local_irq_restore(flags);
>
> aperf_delta = aperf - s->aperf;
> mperf_delta = mperf - s->mperf;
> --
> 2.7.4
>



-- 
Len Brown, Intel Open Source Technology Center


[PATCH v2 3/3] dt-bindings: ASoC: rockchip: Add rockchip,codec-names property

2017-08-09 Thread Jeffy Chen
Add a new rockchip,codec-names property, so that the driver can parse
the codecs by name.

Signed-off-by: Jeffy Chen 
---

Changes in v2:
Let rockchip,codec-names be a required property.

 Documentation/devicetree/bindings/sound/rockchip,rk3399-gru-sound.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git 
a/Documentation/devicetree/bindings/sound/rockchip,rk3399-gru-sound.txt 
b/Documentation/devicetree/bindings/sound/rockchip,rk3399-gru-sound.txt
index eac91db..05351df 100644
--- a/Documentation/devicetree/bindings/sound/rockchip,rk3399-gru-sound.txt
+++ b/Documentation/devicetree/bindings/sound/rockchip,rk3399-gru-sound.txt
@@ -5,6 +5,7 @@ Required properties:
 - rockchip,cpu: The phandle of the Rockchip I2S controller that's
   connected to the codecs
 - rockchip,codec: The phandle of the MAX98357A/RT5514/DA7219 codecs
+- rockchip,codec-names: The names of the MAX98357A/RT5514/DA7219 codecs
 
 Optional properties:
 - dmic-wakeup-delay-ms : specify delay time (ms) for DMIC ready.
@@ -18,5 +19,6 @@ sound {
compatible = "rockchip,rk3399-gru-sound";
rockchip,cpu = <>;
rockchip,codec = <  >;
+   rockchip,codec-names = "MAX98357A", "RT5514", "DA7219";
dmic-wakeup-delay-ms = <20>;
 };
-- 
2.1.4




[PATCH v2 3/3] dt-bindings: ASoC: rockchip: Add rockchip,codec-names property

2017-08-09 Thread Jeffy Chen
Add a new rockchip,codec-names property, so that the driver can parse
the codecs by name.

Signed-off-by: Jeffy Chen 
---

Changes in v2:
Let rockchip,codec-names be a required property.

 Documentation/devicetree/bindings/sound/rockchip,rk3399-gru-sound.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git 
a/Documentation/devicetree/bindings/sound/rockchip,rk3399-gru-sound.txt 
b/Documentation/devicetree/bindings/sound/rockchip,rk3399-gru-sound.txt
index eac91db..05351df 100644
--- a/Documentation/devicetree/bindings/sound/rockchip,rk3399-gru-sound.txt
+++ b/Documentation/devicetree/bindings/sound/rockchip,rk3399-gru-sound.txt
@@ -5,6 +5,7 @@ Required properties:
 - rockchip,cpu: The phandle of the Rockchip I2S controller that's
   connected to the codecs
 - rockchip,codec: The phandle of the MAX98357A/RT5514/DA7219 codecs
+- rockchip,codec-names: The names of the MAX98357A/RT5514/DA7219 codecs
 
 Optional properties:
 - dmic-wakeup-delay-ms : specify delay time (ms) for DMIC ready.
@@ -18,5 +19,6 @@ sound {
compatible = "rockchip,rk3399-gru-sound";
rockchip,cpu = <>;
rockchip,codec = <  >;
+   rockchip,codec-names = "MAX98357A", "RT5514", "DA7219";
dmic-wakeup-delay-ms = <20>;
 };
-- 
2.1.4




[PATCH v2 2/3] arm64: dts: rockchip: Add rockchip,codec-names property

2017-08-09 Thread Jeffy Chen
Add rockchip,codec-names property for codecs.

Signed-off-by: Jeffy Chen 
---

Changes in v2: None

 arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi 
b/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi
index d48e98b..c8f7f0c 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi
@@ -515,6 +515,7 @@
compatible = "rockchip,rk3399-gru-sound";
rockchip,cpu = < >;
rockchip,codec = <  >;
+   rockchip,codec-names = "MAX98357A", "RT5514", "DA7219";
};
 };
 
-- 
2.1.4




[PATCH v2 2/3] arm64: dts: rockchip: Add rockchip,codec-names property

2017-08-09 Thread Jeffy Chen
Add rockchip,codec-names property for codecs.

Signed-off-by: Jeffy Chen 
---

Changes in v2: None

 arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi 
b/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi
index d48e98b..c8f7f0c 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi
@@ -515,6 +515,7 @@
compatible = "rockchip,rk3399-gru-sound";
rockchip,cpu = < >;
rockchip,codec = <  >;
+   rockchip,codec-names = "MAX98357A", "RT5514", "DA7219";
};
 };
 
-- 
2.1.4




[PATCH v2 0/3] ASoC: rockchip: Parse dai links from dts

2017-08-09 Thread Jeffy Chen

Currently we are using a fixed list of dai links in the driver.
This serial of patches would let the driver parse dai links from
dts, so that we can disable some of them for future boards in the
dts.

Tested on my chromebook bob(with cros 4.4 kernel), it still works
after disabled rt5514 codec in the dts.


Changes in v2:
Let rockchip,codec-names be a required property, because we plan to
add more supported codecs to the fixed dai link list in the driver.
Let rockchip,codec-names be a required property.

Jeffy Chen (3):
  ASoC: rockchip: Parse dai links from dts
  arm64: dts: rockchip: Add rockchip,codec-names property
  dt-bindings: ASoC: rockchip: Add rockchip,codec-names property

 .../bindings/sound/rockchip,rk3399-gru-sound.txt   |   2 +
 arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi   |   1 +
 sound/soc/rockchip/rk3399_gru_sound.c  | 125 +
 3 files changed, 84 insertions(+), 44 deletions(-)

-- 
2.1.4




[PATCH v2 1/3] ASoC: rockchip: Parse dai links from dts

2017-08-09 Thread Jeffy Chen
Refactor rockchip_sound_probe, parse dai links from dts instead of
hard coding them.

Signed-off-by: Jeffy Chen 
---

Changes in v2:
Let rockchip,codec-names be a required property, because we plan to
add more supported codecs to the fixed dai link list in the driver.

 sound/soc/rockchip/rk3399_gru_sound.c | 125 ++
 1 file changed, 81 insertions(+), 44 deletions(-)

diff --git a/sound/soc/rockchip/rk3399_gru_sound.c 
b/sound/soc/rockchip/rk3399_gru_sound.c
index 3475c61..03b7fae 100644
--- a/sound/soc/rockchip/rk3399_gru_sound.c
+++ b/sound/soc/rockchip/rk3399_gru_sound.c
@@ -247,9 +247,7 @@ enum {
DAILINK_RT5514_DSP,
 };
 
-#define DAILINK_ENTITIES   (DAILINK_DA7219 + 1)
-
-static struct snd_soc_dai_link rockchip_dailinks[] = {
+static const struct snd_soc_dai_link rockchip_dais[] = {
[DAILINK_MAX98357A] = {
.name = "MAX98357A",
.stream_name = "MAX98357A PCM",
@@ -290,8 +288,6 @@ static struct snd_soc_dai_link rockchip_dailinks[] = {
 static struct snd_soc_card rockchip_sound_card = {
.name = "rk3399-gru-sound",
.owner = THIS_MODULE,
-   .dai_link = rockchip_dailinks,
-   .num_links =  ARRAY_SIZE(rockchip_dailinks),
.dapm_widgets = rockchip_dapm_widgets,
.num_dapm_widgets = ARRAY_SIZE(rockchip_dapm_widgets),
.dapm_routes = rockchip_dapm_routes,
@@ -305,71 +301,112 @@ static int rockchip_sound_match_stub(struct device *dev, 
void *data)
return 1;
 }
 
-static int rockchip_sound_probe(struct platform_device *pdev)
+static int rockchip_sound_of_parse_dais(struct device *dev,
+   struct snd_soc_card *card)
 {
-   struct snd_soc_card *card = _sound_card;
+   struct device *rt5514_dev;
+   struct device_driver *rt5514_drv;
struct device_node *cpu_node;
-   struct device *dev;
-   struct device_driver *drv;
-   int i, ret;
-
-   cpu_node = of_parse_phandle(pdev->dev.of_node, "rockchip,cpu", 0);
-   if (!cpu_node) {
-   dev_err(>dev, "Property 'rockchip,cpu' missing or 
invalid\n");
-   return -EINVAL;
-   }
-
-   for (i = 0; i < DAILINK_ENTITIES; i++) {
-   rockchip_dailinks[i].platform_of_node = cpu_node;
-   rockchip_dailinks[i].cpu_of_node = cpu_node;
+   struct device_node *np_codec;
+   struct snd_soc_dai_link *dai;
+   bool has_rt5514 = false;
+   int i, index, ret;
+
+   card->dai_link = devm_kzalloc(dev, sizeof(rockchip_dais),
+ GFP_KERNEL);
+   if (!card->dai_link)
+   return -ENOMEM;
+
+   cpu_node = of_parse_phandle(dev->of_node, "rockchip,cpu", 0);
+
+   card->num_links = 0;
+   for (i = 0; i < DAILINK_RT5514_DSP; i++) {
+   index = of_property_match_string(dev->of_node,
+   "rockchip,codec-names",
+   rockchip_dais[i].name);
+   if (index < 0)
+   continue;
+
+   np_codec = of_parse_phandle(dev->of_node,
+   "rockchip,codec", index);
+   if (!np_codec) {
+   dev_err(dev, "Missing 'rockchip,codec' for %s\n",
+   rockchip_dais[i].name);
+   return -EINVAL;
+   }
+   if (!of_device_is_available(np_codec))
+   continue;
 
-   rockchip_dailinks[i].codec_of_node =
-   of_parse_phandle(pdev->dev.of_node, "rockchip,codec", 
i);
-   if (!rockchip_dailinks[i].codec_of_node) {
-   dev_err(>dev,
-   "Property[%d] 'rockchip,codec' missing or 
invalid\n", i);
+   if (!cpu_node) {
+   dev_err(dev, "Missing 'rockchip,cpu' for %s\n",
+   rockchip_dais[i].name);
return -EINVAL;
}
+
+   dai = >dai_link[card->num_links++];
+   *dai = rockchip_dais[i];
+
+   dai->codec_of_node = np_codec;
+   dai->platform_of_node = cpu_node;
+   dai->cpu_of_node = cpu_node;
+
+   if (i == DAILINK_RT5514)
+   has_rt5514 = true;
}
 
+   if (!has_rt5514)
+   return 0;
+
/**
 * To acquire the spi driver of the rt5514 and set the dai-links names
 * for soc_bind_dai_link
 */
-   drv = driver_find("rt5514", _bus_type);
-   if (!drv) {
-   dev_err(>dev, "Can not find the rt5514 driver at the spi 
bus\n");
+   rt5514_drv = driver_find("rt5514", _bus_type);
+   if (!rt5514_drv) {
+   dev_err(dev, "Can not find the rt5514 driver at the spi bus\n");
return -EINVAL;
}
 
-   dev = 

[PATCH v2 0/3] ASoC: rockchip: Parse dai links from dts

2017-08-09 Thread Jeffy Chen

Currently we are using a fixed list of dai links in the driver.
This serial of patches would let the driver parse dai links from
dts, so that we can disable some of them for future boards in the
dts.

Tested on my chromebook bob(with cros 4.4 kernel), it still works
after disabled rt5514 codec in the dts.


Changes in v2:
Let rockchip,codec-names be a required property, because we plan to
add more supported codecs to the fixed dai link list in the driver.
Let rockchip,codec-names be a required property.

Jeffy Chen (3):
  ASoC: rockchip: Parse dai links from dts
  arm64: dts: rockchip: Add rockchip,codec-names property
  dt-bindings: ASoC: rockchip: Add rockchip,codec-names property

 .../bindings/sound/rockchip,rk3399-gru-sound.txt   |   2 +
 arch/arm64/boot/dts/rockchip/rk3399-gru.dtsi   |   1 +
 sound/soc/rockchip/rk3399_gru_sound.c  | 125 +
 3 files changed, 84 insertions(+), 44 deletions(-)

-- 
2.1.4




[PATCH v2 1/3] ASoC: rockchip: Parse dai links from dts

2017-08-09 Thread Jeffy Chen
Refactor rockchip_sound_probe, parse dai links from dts instead of
hard coding them.

Signed-off-by: Jeffy Chen 
---

Changes in v2:
Let rockchip,codec-names be a required property, because we plan to
add more supported codecs to the fixed dai link list in the driver.

 sound/soc/rockchip/rk3399_gru_sound.c | 125 ++
 1 file changed, 81 insertions(+), 44 deletions(-)

diff --git a/sound/soc/rockchip/rk3399_gru_sound.c 
b/sound/soc/rockchip/rk3399_gru_sound.c
index 3475c61..03b7fae 100644
--- a/sound/soc/rockchip/rk3399_gru_sound.c
+++ b/sound/soc/rockchip/rk3399_gru_sound.c
@@ -247,9 +247,7 @@ enum {
DAILINK_RT5514_DSP,
 };
 
-#define DAILINK_ENTITIES   (DAILINK_DA7219 + 1)
-
-static struct snd_soc_dai_link rockchip_dailinks[] = {
+static const struct snd_soc_dai_link rockchip_dais[] = {
[DAILINK_MAX98357A] = {
.name = "MAX98357A",
.stream_name = "MAX98357A PCM",
@@ -290,8 +288,6 @@ static struct snd_soc_dai_link rockchip_dailinks[] = {
 static struct snd_soc_card rockchip_sound_card = {
.name = "rk3399-gru-sound",
.owner = THIS_MODULE,
-   .dai_link = rockchip_dailinks,
-   .num_links =  ARRAY_SIZE(rockchip_dailinks),
.dapm_widgets = rockchip_dapm_widgets,
.num_dapm_widgets = ARRAY_SIZE(rockchip_dapm_widgets),
.dapm_routes = rockchip_dapm_routes,
@@ -305,71 +301,112 @@ static int rockchip_sound_match_stub(struct device *dev, 
void *data)
return 1;
 }
 
-static int rockchip_sound_probe(struct platform_device *pdev)
+static int rockchip_sound_of_parse_dais(struct device *dev,
+   struct snd_soc_card *card)
 {
-   struct snd_soc_card *card = _sound_card;
+   struct device *rt5514_dev;
+   struct device_driver *rt5514_drv;
struct device_node *cpu_node;
-   struct device *dev;
-   struct device_driver *drv;
-   int i, ret;
-
-   cpu_node = of_parse_phandle(pdev->dev.of_node, "rockchip,cpu", 0);
-   if (!cpu_node) {
-   dev_err(>dev, "Property 'rockchip,cpu' missing or 
invalid\n");
-   return -EINVAL;
-   }
-
-   for (i = 0; i < DAILINK_ENTITIES; i++) {
-   rockchip_dailinks[i].platform_of_node = cpu_node;
-   rockchip_dailinks[i].cpu_of_node = cpu_node;
+   struct device_node *np_codec;
+   struct snd_soc_dai_link *dai;
+   bool has_rt5514 = false;
+   int i, index, ret;
+
+   card->dai_link = devm_kzalloc(dev, sizeof(rockchip_dais),
+ GFP_KERNEL);
+   if (!card->dai_link)
+   return -ENOMEM;
+
+   cpu_node = of_parse_phandle(dev->of_node, "rockchip,cpu", 0);
+
+   card->num_links = 0;
+   for (i = 0; i < DAILINK_RT5514_DSP; i++) {
+   index = of_property_match_string(dev->of_node,
+   "rockchip,codec-names",
+   rockchip_dais[i].name);
+   if (index < 0)
+   continue;
+
+   np_codec = of_parse_phandle(dev->of_node,
+   "rockchip,codec", index);
+   if (!np_codec) {
+   dev_err(dev, "Missing 'rockchip,codec' for %s\n",
+   rockchip_dais[i].name);
+   return -EINVAL;
+   }
+   if (!of_device_is_available(np_codec))
+   continue;
 
-   rockchip_dailinks[i].codec_of_node =
-   of_parse_phandle(pdev->dev.of_node, "rockchip,codec", 
i);
-   if (!rockchip_dailinks[i].codec_of_node) {
-   dev_err(>dev,
-   "Property[%d] 'rockchip,codec' missing or 
invalid\n", i);
+   if (!cpu_node) {
+   dev_err(dev, "Missing 'rockchip,cpu' for %s\n",
+   rockchip_dais[i].name);
return -EINVAL;
}
+
+   dai = >dai_link[card->num_links++];
+   *dai = rockchip_dais[i];
+
+   dai->codec_of_node = np_codec;
+   dai->platform_of_node = cpu_node;
+   dai->cpu_of_node = cpu_node;
+
+   if (i == DAILINK_RT5514)
+   has_rt5514 = true;
}
 
+   if (!has_rt5514)
+   return 0;
+
/**
 * To acquire the spi driver of the rt5514 and set the dai-links names
 * for soc_bind_dai_link
 */
-   drv = driver_find("rt5514", _bus_type);
-   if (!drv) {
-   dev_err(>dev, "Can not find the rt5514 driver at the spi 
bus\n");
+   rt5514_drv = driver_find("rt5514", _bus_type);
+   if (!rt5514_drv) {
+   dev_err(dev, "Can not find the rt5514 driver at the spi bus\n");
return -EINVAL;
}
 
-   dev = driver_find_device(drv, NULL, NULL, 

[PATCH linux-next v5 1/1] spi: imx: dynamic burst length adjust for PIO mode

2017-08-09 Thread Jiada Wang
previously burst length (BURST_LENGTH) is always set to equal
to bits_per_word, causes a 10us gap between each word in
transfer, which significantly affects performance.

This patch uses 32 bits transfer to simulate lower bits transfer,
and adjusts burst length runtimely to use biggeest burst length
as possible to reduce the gaps in transfer for PIO mode.

Signed-off-by: Jiada Wang 
---
Changes in v5:
* Rebased on latest linux-next tree
* corrected burst_length value in spi_imx_buf_tx_swap()
* fixed compile error issue caused by incorrect code rebase in v4

Changes in v4:
* Rebased to latest linux-next tree
* Added support to bits_per_word other than 8,16,32 as well
* Renamed several variables
* Added dynamic_burst to spi_imx_devtype_data
* Removed change from mx51_ecspi_config()

Changes in v3:
* Only allow dynamic burst in PIO mode
* Avoid direct manipulation of tx_buf & rx_buf

Changes in v2:
* used cpu_to_* functions to ensure this patch works for both
  little & big endian kernel.

 drivers/spi/spi-imx.c | 150 +++---
 1 file changed, 141 insertions(+), 9 deletions(-)

diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index 930e475..cc808a1 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -56,6 +56,7 @@
 
 /* The maximum  bytes that a sdma BD can transfer.*/
 #define MAX_SDMA_BD_BYTES  (1 << 15)
+#define MX51_ECSPI_CTRL_MAX_BURST  512
 
 enum spi_imx_devtype {
IMX1_CSPI,
@@ -77,6 +78,7 @@ struct spi_imx_devtype_data {
void (*reset)(struct spi_imx_data *);
bool has_dmamode;
unsigned int fifo_size;
+   bool dynamic_burst;
enum spi_imx_devtype devtype;
 };
 
@@ -97,12 +99,14 @@ struct spi_imx_data {
unsigned int bits_per_word;
unsigned int spi_drctl;
 
-   unsigned int count;
+   unsigned int count, remainder;
void (*tx)(struct spi_imx_data *);
void (*rx)(struct spi_imx_data *);
void *rx_buf;
const void *tx_buf;
unsigned int txfifo; /* number of words pushed in tx FIFO */
+   unsigned int dynamic_burst, read_u32;
+   unsigned int word_mask;
 
/* DMA */
bool usedma;
@@ -231,6 +235,7 @@ static bool spi_imx_can_dma(struct spi_master *master, 
struct spi_device *spi,
return false;
 
spi_imx->wml = i;
+   spi_imx->dynamic_burst = 0;
 
return true;
 }
@@ -245,6 +250,7 @@ static bool spi_imx_can_dma(struct spi_master *master, 
struct spi_device *spi,
 #define MX51_ECSPI_CTRL_PREDIV_OFFSET  12
 #define MX51_ECSPI_CTRL_CS(cs) ((cs) << 18)
 #define MX51_ECSPI_CTRL_BL_OFFSET  20
+#define MX51_ECSPI_CTRL_BL_MASK(0xfff << 20)
 
 #define MX51_ECSPI_CONFIG  0x0c
 #define MX51_ECSPI_CONFIG_SCLKPHA(cs)  (1 << ((cs) +  0))
@@ -272,6 +278,102 @@ static bool spi_imx_can_dma(struct spi_master *master, 
struct spi_device *spi,
 #define MX51_ECSPI_TESTREG 0x20
 #define MX51_ECSPI_TESTREG_LBC BIT(31)
 
+static void spi_imx_buf_rx_swap_u32(struct spi_imx_data *spi_imx)
+{
+   unsigned int val = readl(spi_imx->base + MXC_CSPIRXDATA);
+   unsigned int bytes_per_word;
+
+   if (spi_imx->rx_buf) {
+#ifdef __LITTLE_ENDIAN
+   bytes_per_word = spi_imx_bytes_per_word(spi_imx->bits_per_word);
+   if (bytes_per_word == 1)
+   val = cpu_to_be32(val);
+   else if (bytes_per_word == 2)
+   val = (val << 16) | (val >> 16);
+#endif
+   val &= spi_imx->word_mask;
+   *(u32 *)spi_imx->rx_buf = val;
+   spi_imx->rx_buf += sizeof(u32);
+   }
+}
+
+static void spi_imx_buf_rx_swap(struct spi_imx_data *spi_imx)
+{
+   unsigned int bytes_per_word;
+
+   bytes_per_word = spi_imx_bytes_per_word(spi_imx->bits_per_word);
+   if (spi_imx->read_u32) {
+   spi_imx_buf_rx_swap_u32(spi_imx);
+   return;
+   }
+
+   if (bytes_per_word == 1)
+   spi_imx_buf_rx_u8(spi_imx);
+   else if (bytes_per_word == 2)
+   spi_imx_buf_rx_u16(spi_imx);
+}
+
+static void spi_imx_buf_tx_swap_u32(struct spi_imx_data *spi_imx)
+{
+   u32 val = 0;
+   unsigned int bytes_per_word;
+
+   if (spi_imx->tx_buf) {
+   val = *(u32 *)spi_imx->tx_buf;
+   val &= spi_imx->word_mask;
+   spi_imx->tx_buf += sizeof(u32);
+   }
+
+   spi_imx->count -= sizeof(u32);
+#ifdef __LITTLE_ENDIAN
+   bytes_per_word = spi_imx_bytes_per_word(spi_imx->bits_per_word);
+
+   if (bytes_per_word == 1)
+   val = cpu_to_be32(val);
+   else if (bytes_per_word == 2)
+   val = (val << 16) | (val >> 16);
+#endif
+   writel(val, spi_imx->base + MXC_CSPITXDATA);
+}
+
+static void spi_imx_buf_tx_swap(struct spi_imx_data *spi_imx)
+{
+   u32 ctrl, val;
+   unsigned int bytes_per_word;
+
+   if (spi_imx->count == 

[PATCH linux-next v5 1/1] spi: imx: dynamic burst length adjust for PIO mode

2017-08-09 Thread Jiada Wang
previously burst length (BURST_LENGTH) is always set to equal
to bits_per_word, causes a 10us gap between each word in
transfer, which significantly affects performance.

This patch uses 32 bits transfer to simulate lower bits transfer,
and adjusts burst length runtimely to use biggeest burst length
as possible to reduce the gaps in transfer for PIO mode.

Signed-off-by: Jiada Wang 
---
Changes in v5:
* Rebased on latest linux-next tree
* corrected burst_length value in spi_imx_buf_tx_swap()
* fixed compile error issue caused by incorrect code rebase in v4

Changes in v4:
* Rebased to latest linux-next tree
* Added support to bits_per_word other than 8,16,32 as well
* Renamed several variables
* Added dynamic_burst to spi_imx_devtype_data
* Removed change from mx51_ecspi_config()

Changes in v3:
* Only allow dynamic burst in PIO mode
* Avoid direct manipulation of tx_buf & rx_buf

Changes in v2:
* used cpu_to_* functions to ensure this patch works for both
  little & big endian kernel.

 drivers/spi/spi-imx.c | 150 +++---
 1 file changed, 141 insertions(+), 9 deletions(-)

diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index 930e475..cc808a1 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -56,6 +56,7 @@
 
 /* The maximum  bytes that a sdma BD can transfer.*/
 #define MAX_SDMA_BD_BYTES  (1 << 15)
+#define MX51_ECSPI_CTRL_MAX_BURST  512
 
 enum spi_imx_devtype {
IMX1_CSPI,
@@ -77,6 +78,7 @@ struct spi_imx_devtype_data {
void (*reset)(struct spi_imx_data *);
bool has_dmamode;
unsigned int fifo_size;
+   bool dynamic_burst;
enum spi_imx_devtype devtype;
 };
 
@@ -97,12 +99,14 @@ struct spi_imx_data {
unsigned int bits_per_word;
unsigned int spi_drctl;
 
-   unsigned int count;
+   unsigned int count, remainder;
void (*tx)(struct spi_imx_data *);
void (*rx)(struct spi_imx_data *);
void *rx_buf;
const void *tx_buf;
unsigned int txfifo; /* number of words pushed in tx FIFO */
+   unsigned int dynamic_burst, read_u32;
+   unsigned int word_mask;
 
/* DMA */
bool usedma;
@@ -231,6 +235,7 @@ static bool spi_imx_can_dma(struct spi_master *master, 
struct spi_device *spi,
return false;
 
spi_imx->wml = i;
+   spi_imx->dynamic_burst = 0;
 
return true;
 }
@@ -245,6 +250,7 @@ static bool spi_imx_can_dma(struct spi_master *master, 
struct spi_device *spi,
 #define MX51_ECSPI_CTRL_PREDIV_OFFSET  12
 #define MX51_ECSPI_CTRL_CS(cs) ((cs) << 18)
 #define MX51_ECSPI_CTRL_BL_OFFSET  20
+#define MX51_ECSPI_CTRL_BL_MASK(0xfff << 20)
 
 #define MX51_ECSPI_CONFIG  0x0c
 #define MX51_ECSPI_CONFIG_SCLKPHA(cs)  (1 << ((cs) +  0))
@@ -272,6 +278,102 @@ static bool spi_imx_can_dma(struct spi_master *master, 
struct spi_device *spi,
 #define MX51_ECSPI_TESTREG 0x20
 #define MX51_ECSPI_TESTREG_LBC BIT(31)
 
+static void spi_imx_buf_rx_swap_u32(struct spi_imx_data *spi_imx)
+{
+   unsigned int val = readl(spi_imx->base + MXC_CSPIRXDATA);
+   unsigned int bytes_per_word;
+
+   if (spi_imx->rx_buf) {
+#ifdef __LITTLE_ENDIAN
+   bytes_per_word = spi_imx_bytes_per_word(spi_imx->bits_per_word);
+   if (bytes_per_word == 1)
+   val = cpu_to_be32(val);
+   else if (bytes_per_word == 2)
+   val = (val << 16) | (val >> 16);
+#endif
+   val &= spi_imx->word_mask;
+   *(u32 *)spi_imx->rx_buf = val;
+   spi_imx->rx_buf += sizeof(u32);
+   }
+}
+
+static void spi_imx_buf_rx_swap(struct spi_imx_data *spi_imx)
+{
+   unsigned int bytes_per_word;
+
+   bytes_per_word = spi_imx_bytes_per_word(spi_imx->bits_per_word);
+   if (spi_imx->read_u32) {
+   spi_imx_buf_rx_swap_u32(spi_imx);
+   return;
+   }
+
+   if (bytes_per_word == 1)
+   spi_imx_buf_rx_u8(spi_imx);
+   else if (bytes_per_word == 2)
+   spi_imx_buf_rx_u16(spi_imx);
+}
+
+static void spi_imx_buf_tx_swap_u32(struct spi_imx_data *spi_imx)
+{
+   u32 val = 0;
+   unsigned int bytes_per_word;
+
+   if (spi_imx->tx_buf) {
+   val = *(u32 *)spi_imx->tx_buf;
+   val &= spi_imx->word_mask;
+   spi_imx->tx_buf += sizeof(u32);
+   }
+
+   spi_imx->count -= sizeof(u32);
+#ifdef __LITTLE_ENDIAN
+   bytes_per_word = spi_imx_bytes_per_word(spi_imx->bits_per_word);
+
+   if (bytes_per_word == 1)
+   val = cpu_to_be32(val);
+   else if (bytes_per_word == 2)
+   val = (val << 16) | (val >> 16);
+#endif
+   writel(val, spi_imx->base + MXC_CSPITXDATA);
+}
+
+static void spi_imx_buf_tx_swap(struct spi_imx_data *spi_imx)
+{
+   u32 ctrl, val;
+   unsigned int bytes_per_word;
+
+   if (spi_imx->count == spi_imx->remainder) {
+  

Re: [PATCH v2] f2fs: add app/fs io stat

2017-08-09 Thread Jaegeuk Kim
Hi Chao,

I've fixed the below in f2fs.git.

On 08/02, Chao Yu wrote:
> From: Chao Yu 
> 
> This patch enables inner app/fs io stats and introduces below virtual fs
> nodes for exposing stats info:
> /sys/fs/f2fs//iostat_enable
> /proc/fs/f2fs//iostat_info
> 
> Signed-off-by: Chao Yu 
> ---
> v2:
> - reorganize printed info of iostat_info.
> - add discard stats.
>  fs/f2fs/checkpoint.c | 34 +-
>  fs/f2fs/data.c   | 35 +++
>  fs/f2fs/f2fs.h   | 59 
> +---
>  fs/f2fs/file.c   |  7 ++-
>  fs/f2fs/gc.c |  3 +++
>  fs/f2fs/inline.c |  1 +
>  fs/f2fs/node.c   | 15 +++--
>  fs/f2fs/segment.c| 21 +--
>  fs/f2fs/super.c  |  4 
>  fs/f2fs/sysfs.c  | 52 +
>  10 files changed, 200 insertions(+), 31 deletions(-)
> 
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 3c84a2520796..da5b49183e09 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -230,8 +230,9 @@ void ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t 
> index)
>   ra_meta_pages(sbi, index, BIO_MAX_PAGES, META_POR, true);
>  }
>  
> -static int f2fs_write_meta_page(struct page *page,
> - struct writeback_control *wbc)
> +static int __f2fs_write_meta_page(struct page *page,
> + struct writeback_control *wbc,
> + enum iostat_type io_type)
>  {
>   struct f2fs_sb_info *sbi = F2FS_P_SB(page);
>  
> @@ -244,7 +245,7 @@ static int f2fs_write_meta_page(struct page *page,
>   if (unlikely(f2fs_cp_error(sbi)))
>   goto redirty_out;
>  
> - write_meta_page(sbi, page);
> + write_meta_page(sbi, page, io_type);
>   dec_page_count(sbi, F2FS_DIRTY_META);
>  
>   if (wbc->for_reclaim)
> @@ -263,6 +264,12 @@ static int f2fs_write_meta_page(struct page *page,
>   return AOP_WRITEPAGE_ACTIVATE;
>  }
>  
> +static int f2fs_write_meta_page(struct page *page,
> + struct writeback_control *wbc)
> +{
> + return __f2fs_write_meta_page(page, wbc, FS_META_IO);
> +}
> +
>  static int f2fs_write_meta_pages(struct address_space *mapping,
>   struct writeback_control *wbc)
>  {
> @@ -283,7 +290,7 @@ static int f2fs_write_meta_pages(struct address_space 
> *mapping,
>  
>   trace_f2fs_writepages(mapping->host, wbc, META);
>   diff = nr_pages_to_write(sbi, META, wbc);
> - written = sync_meta_pages(sbi, META, wbc->nr_to_write);
> + written = sync_meta_pages(sbi, META, wbc->nr_to_write, FS_META_IO);
>   mutex_unlock(>cp_mutex);
>   wbc->nr_to_write = max((long)0, wbc->nr_to_write - written - diff);
>   return 0;
> @@ -295,7 +302,7 @@ static int f2fs_write_meta_pages(struct address_space 
> *mapping,
>  }
>  
>  long sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type,
> - long nr_to_write)
> + long nr_to_write, enum iostat_type io_type)
>  {
>   struct address_space *mapping = META_MAPPING(sbi);
>   pgoff_t index = 0, end = ULONG_MAX, prev = ULONG_MAX;
> @@ -346,7 +353,7 @@ long sync_meta_pages(struct f2fs_sb_info *sbi, enum 
> page_type type,
>   if (!clear_page_dirty_for_io(page))
>   goto continue_unlock;
>  
> - if (mapping->a_ops->writepage(page, )) {
> + if (__f2fs_write_meta_page(page, , io_type)) {
>   unlock_page(page);
>   break;
>   }
> @@ -904,7 +911,14 @@ int sync_dirty_inodes(struct f2fs_sb_info *sbi, enum 
> inode_type type)
>   if (inode) {
>   unsigned long cur_ino = inode->i_ino;
>  
> + if (is_dir)
> + F2FS_I(inode)->cp_task = current;
> +
>   filemap_fdatawrite(inode->i_mapping);
> +
> + if (is_dir)
> + F2FS_I(inode)->cp_task = NULL;
> +
>   iput(inode);
>   /* We need to give cpu to another writers. */
>   if (ino == cur_ino) {
> @@ -1017,7 +1031,7 @@ static int block_operations(struct f2fs_sb_info *sbi)
>  
>   if (get_pages(sbi, F2FS_DIRTY_NODES)) {
>   up_write(>node_write);
> - err = sync_node_pages(sbi, , false);
> + err = sync_node_pages(sbi, , false, FS_CP_NODE_IO);
>   if (err) {
>   up_write(>node_change);
>   f2fs_unlock_all(sbi);
> @@ -1115,7 +1129,7 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, 
> struct cp_control *cpc)
>  
>   /* Flush all the NAT/SIT pages */
>   while (get_pages(sbi, F2FS_DIRTY_META)) {
> - sync_meta_pages(sbi, META, LONG_MAX);
> +

Re: [PATCH v2] f2fs: add app/fs io stat

2017-08-09 Thread Jaegeuk Kim
Hi Chao,

I've fixed the below in f2fs.git.

On 08/02, Chao Yu wrote:
> From: Chao Yu 
> 
> This patch enables inner app/fs io stats and introduces below virtual fs
> nodes for exposing stats info:
> /sys/fs/f2fs//iostat_enable
> /proc/fs/f2fs//iostat_info
> 
> Signed-off-by: Chao Yu 
> ---
> v2:
> - reorganize printed info of iostat_info.
> - add discard stats.
>  fs/f2fs/checkpoint.c | 34 +-
>  fs/f2fs/data.c   | 35 +++
>  fs/f2fs/f2fs.h   | 59 
> +---
>  fs/f2fs/file.c   |  7 ++-
>  fs/f2fs/gc.c |  3 +++
>  fs/f2fs/inline.c |  1 +
>  fs/f2fs/node.c   | 15 +++--
>  fs/f2fs/segment.c| 21 +--
>  fs/f2fs/super.c  |  4 
>  fs/f2fs/sysfs.c  | 52 +
>  10 files changed, 200 insertions(+), 31 deletions(-)
> 
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 3c84a2520796..da5b49183e09 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -230,8 +230,9 @@ void ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t 
> index)
>   ra_meta_pages(sbi, index, BIO_MAX_PAGES, META_POR, true);
>  }
>  
> -static int f2fs_write_meta_page(struct page *page,
> - struct writeback_control *wbc)
> +static int __f2fs_write_meta_page(struct page *page,
> + struct writeback_control *wbc,
> + enum iostat_type io_type)
>  {
>   struct f2fs_sb_info *sbi = F2FS_P_SB(page);
>  
> @@ -244,7 +245,7 @@ static int f2fs_write_meta_page(struct page *page,
>   if (unlikely(f2fs_cp_error(sbi)))
>   goto redirty_out;
>  
> - write_meta_page(sbi, page);
> + write_meta_page(sbi, page, io_type);
>   dec_page_count(sbi, F2FS_DIRTY_META);
>  
>   if (wbc->for_reclaim)
> @@ -263,6 +264,12 @@ static int f2fs_write_meta_page(struct page *page,
>   return AOP_WRITEPAGE_ACTIVATE;
>  }
>  
> +static int f2fs_write_meta_page(struct page *page,
> + struct writeback_control *wbc)
> +{
> + return __f2fs_write_meta_page(page, wbc, FS_META_IO);
> +}
> +
>  static int f2fs_write_meta_pages(struct address_space *mapping,
>   struct writeback_control *wbc)
>  {
> @@ -283,7 +290,7 @@ static int f2fs_write_meta_pages(struct address_space 
> *mapping,
>  
>   trace_f2fs_writepages(mapping->host, wbc, META);
>   diff = nr_pages_to_write(sbi, META, wbc);
> - written = sync_meta_pages(sbi, META, wbc->nr_to_write);
> + written = sync_meta_pages(sbi, META, wbc->nr_to_write, FS_META_IO);
>   mutex_unlock(>cp_mutex);
>   wbc->nr_to_write = max((long)0, wbc->nr_to_write - written - diff);
>   return 0;
> @@ -295,7 +302,7 @@ static int f2fs_write_meta_pages(struct address_space 
> *mapping,
>  }
>  
>  long sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type,
> - long nr_to_write)
> + long nr_to_write, enum iostat_type io_type)
>  {
>   struct address_space *mapping = META_MAPPING(sbi);
>   pgoff_t index = 0, end = ULONG_MAX, prev = ULONG_MAX;
> @@ -346,7 +353,7 @@ long sync_meta_pages(struct f2fs_sb_info *sbi, enum 
> page_type type,
>   if (!clear_page_dirty_for_io(page))
>   goto continue_unlock;
>  
> - if (mapping->a_ops->writepage(page, )) {
> + if (__f2fs_write_meta_page(page, , io_type)) {
>   unlock_page(page);
>   break;
>   }
> @@ -904,7 +911,14 @@ int sync_dirty_inodes(struct f2fs_sb_info *sbi, enum 
> inode_type type)
>   if (inode) {
>   unsigned long cur_ino = inode->i_ino;
>  
> + if (is_dir)
> + F2FS_I(inode)->cp_task = current;
> +
>   filemap_fdatawrite(inode->i_mapping);
> +
> + if (is_dir)
> + F2FS_I(inode)->cp_task = NULL;
> +
>   iput(inode);
>   /* We need to give cpu to another writers. */
>   if (ino == cur_ino) {
> @@ -1017,7 +1031,7 @@ static int block_operations(struct f2fs_sb_info *sbi)
>  
>   if (get_pages(sbi, F2FS_DIRTY_NODES)) {
>   up_write(>node_write);
> - err = sync_node_pages(sbi, , false);
> + err = sync_node_pages(sbi, , false, FS_CP_NODE_IO);
>   if (err) {
>   up_write(>node_change);
>   f2fs_unlock_all(sbi);
> @@ -1115,7 +1129,7 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, 
> struct cp_control *cpc)
>  
>   /* Flush all the NAT/SIT pages */
>   while (get_pages(sbi, F2FS_DIRTY_META)) {
> - sync_meta_pages(sbi, META, LONG_MAX);
> + sync_meta_pages(sbi, META, 

[PATCH] device property: use of_graph_get_remote_endpoint() for of_fwnode

2017-08-09 Thread Kuninori Morimoto
From: Kuninori Morimoto 

Now, we can use of_graph_get_remote_endpoint(). Let's use it.

Signed-off-by: Kuninori Morimoto 
---
- not tested

 drivers/of/property.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/of/property.c b/drivers/of/property.c
index 067f9fa..ad4eefa 100644
--- a/drivers/of/property.c
+++ b/drivers/of/property.c
@@ -913,8 +913,8 @@ static struct fwnode_handle *of_fwnode_get_parent(struct 
fwnode_handle *fwnode)
 static struct fwnode_handle *
 of_fwnode_graph_get_remote_endpoint(struct fwnode_handle *fwnode)
 {
-   return of_fwnode_handle(of_parse_phandle(to_of_node(fwnode),
-"remote-endpoint", 0));
+   return of_fwnode_handle(
+   of_graph_get_remote_endpoint(to_of_node(fwnode)));
 }
 
 static struct fwnode_handle *
-- 
1.9.1



[PATCH] device property: use of_graph_get_remote_endpoint() for of_fwnode

2017-08-09 Thread Kuninori Morimoto
From: Kuninori Morimoto 

Now, we can use of_graph_get_remote_endpoint(). Let's use it.

Signed-off-by: Kuninori Morimoto 
---
- not tested

 drivers/of/property.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/of/property.c b/drivers/of/property.c
index 067f9fa..ad4eefa 100644
--- a/drivers/of/property.c
+++ b/drivers/of/property.c
@@ -913,8 +913,8 @@ static struct fwnode_handle *of_fwnode_get_parent(struct 
fwnode_handle *fwnode)
 static struct fwnode_handle *
 of_fwnode_graph_get_remote_endpoint(struct fwnode_handle *fwnode)
 {
-   return of_fwnode_handle(of_parse_phandle(to_of_node(fwnode),
-"remote-endpoint", 0));
+   return of_fwnode_handle(
+   of_graph_get_remote_endpoint(to_of_node(fwnode)));
 }
 
 static struct fwnode_handle *
-- 
1.9.1



[PATCH] drm/sun4i: use of_graph_get_remote_endpoint()

2017-08-09 Thread Kuninori Morimoto

From: Kuninori Morimoto 

Now, we can use of_graph_get_remote_endpoint(). Let's use it.

Signed-off-by: Kuninori Morimoto 
---
- Not tested

 drivers/gpu/drm/sun4i/sun4i_backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/sun4i/sun4i_backend.c 
b/drivers/gpu/drm/sun4i/sun4i_backend.c
index cf48021..ec59436 100644
--- a/drivers/gpu/drm/sun4i/sun4i_backend.c
+++ b/drivers/gpu/drm/sun4i/sun4i_backend.c
@@ -312,7 +312,7 @@ static int sun4i_backend_of_get_id(struct device_node *node)
struct device_node *remote;
u32 reg;
 
-   remote = of_parse_phandle(ep, "remote-endpoint", 0);
+   remote = of_graph_get_remote_endpoint(ep);
if (!remote)
continue;
 
-- 
1.9.1



[PATCH] drm/sun4i: use of_graph_get_remote_endpoint()

2017-08-09 Thread Kuninori Morimoto

From: Kuninori Morimoto 

Now, we can use of_graph_get_remote_endpoint(). Let's use it.

Signed-off-by: Kuninori Morimoto 
---
- Not tested

 drivers/gpu/drm/sun4i/sun4i_backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/sun4i/sun4i_backend.c 
b/drivers/gpu/drm/sun4i/sun4i_backend.c
index cf48021..ec59436 100644
--- a/drivers/gpu/drm/sun4i/sun4i_backend.c
+++ b/drivers/gpu/drm/sun4i/sun4i_backend.c
@@ -312,7 +312,7 @@ static int sun4i_backend_of_get_id(struct device_node *node)
struct device_node *remote;
u32 reg;
 
-   remote = of_parse_phandle(ep, "remote-endpoint", 0);
+   remote = of_graph_get_remote_endpoint(ep);
if (!remote)
continue;
 
-- 
1.9.1



Re: [linux-sunxi] [PATCH 1/3] arm64: allwinner: a64: add ethernet0 alias for BPi M64 EMAC node

2017-08-09 Thread Icenowy Zheng


于 2017年8月10日 GMT+08:00 上午11:56:02, Chen-Yu Tsai  写到:
>Hi,
>
>On Sat, Jul 22, 2017 at 10:28 AM, Icenowy Zheng 
>wrote:
>> The Banana Pi M64 board uses the A64 chip's EMAC to provide Ethernet
>> link.
>>
>> Add the ethernet0 alias in the device tree, in order to let U-Boot
>> generate a MAC address from the chip's SID.
>>
>> Signed-off-by: Icenowy Zheng 
>
>As mentioned in the discussion of the cover letter of this series,
>we'd really like to move this to fixes for 4.13.
>
>I'd like to move forward on this soon. Can I just do a wholesale
>rewrite of the commit message along the lines of the following
>example, and move the 3 patches to fixes for 4.13?

Yes.

Thanks!

>
>arm64: allwinner: a64: bananapi-m64: add missing ethernet0 alias
>
>The EMAC Ethernet controller was enabled, but an accompanying alias
>was not added. This results in unstable numbering if other Ethernet
>devices, such as a USB dongle, are present. Also, the bootloader uses
>the alias to assign a generated stable MAC address to the device node.
>
>Fixes: e7295499903d ("arm64: allwinner: bananapi-m64: Enable
>dwmac-sun8i")
>
>
>Thanks
>ChenYu


Re: [linux-sunxi] [PATCH 1/3] arm64: allwinner: a64: add ethernet0 alias for BPi M64 EMAC node

2017-08-09 Thread Icenowy Zheng


于 2017年8月10日 GMT+08:00 上午11:56:02, Chen-Yu Tsai  写到:
>Hi,
>
>On Sat, Jul 22, 2017 at 10:28 AM, Icenowy Zheng 
>wrote:
>> The Banana Pi M64 board uses the A64 chip's EMAC to provide Ethernet
>> link.
>>
>> Add the ethernet0 alias in the device tree, in order to let U-Boot
>> generate a MAC address from the chip's SID.
>>
>> Signed-off-by: Icenowy Zheng 
>
>As mentioned in the discussion of the cover letter of this series,
>we'd really like to move this to fixes for 4.13.
>
>I'd like to move forward on this soon. Can I just do a wholesale
>rewrite of the commit message along the lines of the following
>example, and move the 3 patches to fixes for 4.13?

Yes.

Thanks!

>
>arm64: allwinner: a64: bananapi-m64: add missing ethernet0 alias
>
>The EMAC Ethernet controller was enabled, but an accompanying alias
>was not added. This results in unstable numbering if other Ethernet
>devices, such as a USB dongle, are present. Also, the bootloader uses
>the alias to assign a generated stable MAC address to the device node.
>
>Fixes: e7295499903d ("arm64: allwinner: bananapi-m64: Enable
>dwmac-sun8i")
>
>
>Thanks
>ChenYu


Re: [PATCH] osq_lock: fix osq_lock queue corruption

2017-08-09 Thread Andrea Parri
On Mon, Jul 31, 2017 at 10:54:50PM +0530, Prateek Sood wrote:
> Fix ordering of link creation between node->prev and prev->next in
> osq_lock(). A case in which the status of optimistic spin queue is
> CPU6->CPU2 in which CPU6 has acquired the lock.
> 
> tail
>   v
>   ,-. <- ,-.
>   |6||2|
>   `-' -> `-'
> 
> At this point if CPU0 comes in to acquire osq_lock, it will update the
> tail count.
> 
>   CPU2CPU0
>   --
> 
>  tail
>v
> ,-. <- ,-.,-.
> |6||2||0|
> `-' -> `-'`-'
> 
> After tail count update if CPU2 starts to unqueue itself from
> optimistic spin queue, it will find updated tail count with CPU0 and
> update CPU2 node->next to NULL in osq_wait_next().
> 
>   unqueue-A
> 
>  tail
>v
>   ,-. <- ,-.,-.
>   |6||2||0|
>   `-'`-'`-'
> 
>   unqueue-B
> 
>   ->tail != curr && !node->next
> 
> If reordering of following stores happen then
> prev->next where prev being CPU2 would be updated to point to CPU0 node:
> 
>  tail
>v
> ,-. <- ,-.,-.
> |6||2||0|
> `-' -> `-' -> `-'
> 
>   osq_wait_next()
> node->next <- 0
> xchg(node->next, NULL)
> 
>  tail
>v
>   ,-. <- ,-.,-.
>   |6||2||0|
>   `-'`-'`-'
> 
>   unqueue-C
> 
> At this point if next instruction
> WRITE_ONCE(next->prev, prev);
> in CPU2 path is committed before the update of CPU0 node->prev = prev then
> CPU0 node->prev will point to CPU6 node.
> 
>  tail
> V--. v
>   ,-. <- ,-.,-.
>   |6||2||0|
>   `-'`-'`-'
>  `--^
> 
> At this point if CPU0 path's node->prev = prev is committed resulting
> in change of CPU0 prev back to CPU2 node. CPU2 node->next is NULL
> currently,
> 
>  tail
>v
> ,-. <- ,-. <- ,-.
> |6||2||0|
> `-'`-'`-'
>`--^
> 
> so if CPU0 gets into unqueue path of osq_lock it will keep spinning
> in infinite loop as condition prev->next == node will never be true.
> 
> Signed-off-by: Prateek Sood 
> ---
>  kernel/locking/osq_lock.c | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index a316794..9f4afa3 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -109,6 +109,19 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  
>   prev = decode_cpu(old);
>   node->prev = prev;
> +
> + /*
> +  * osq_lock()   unqueue
> +  *
> +  * node->prev = prevosq_wait_next()
> +  * WMB  MB
> +  * prev->next = nodenext->prev = prev //unqueue-C
> +  *
> +  * Here 'node->prev' and 'next->prev' are the same variable and we need
> +  * to ensure these stores happen in-order to avoid corrupting the list.
> +  */

The interested pattern/behavior remains somehow implicit in this snippet
(for example, as you described above, a load "reading from" the store to
prev->next is implicit in that osq_wait_next()); however I was unable to
come up with an alternative solution without complicating the comment.

Reviewed-by: Andrea Parri 

  Andrea


> + smp_wmb();
> +
>   WRITE_ONCE(prev->next, node);
>  
>   /*
> -- 
> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, 
> Inc., 
> is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
> 


Re: [PATCH] osq_lock: fix osq_lock queue corruption

2017-08-09 Thread Andrea Parri
On Mon, Jul 31, 2017 at 10:54:50PM +0530, Prateek Sood wrote:
> Fix ordering of link creation between node->prev and prev->next in
> osq_lock(). A case in which the status of optimistic spin queue is
> CPU6->CPU2 in which CPU6 has acquired the lock.
> 
> tail
>   v
>   ,-. <- ,-.
>   |6||2|
>   `-' -> `-'
> 
> At this point if CPU0 comes in to acquire osq_lock, it will update the
> tail count.
> 
>   CPU2CPU0
>   --
> 
>  tail
>v
> ,-. <- ,-.,-.
> |6||2||0|
> `-' -> `-'`-'
> 
> After tail count update if CPU2 starts to unqueue itself from
> optimistic spin queue, it will find updated tail count with CPU0 and
> update CPU2 node->next to NULL in osq_wait_next().
> 
>   unqueue-A
> 
>  tail
>v
>   ,-. <- ,-.,-.
>   |6||2||0|
>   `-'`-'`-'
> 
>   unqueue-B
> 
>   ->tail != curr && !node->next
> 
> If reordering of following stores happen then
> prev->next where prev being CPU2 would be updated to point to CPU0 node:
> 
>  tail
>v
> ,-. <- ,-.,-.
> |6||2||0|
> `-' -> `-' -> `-'
> 
>   osq_wait_next()
> node->next <- 0
> xchg(node->next, NULL)
> 
>  tail
>v
>   ,-. <- ,-.,-.
>   |6||2||0|
>   `-'`-'`-'
> 
>   unqueue-C
> 
> At this point if next instruction
> WRITE_ONCE(next->prev, prev);
> in CPU2 path is committed before the update of CPU0 node->prev = prev then
> CPU0 node->prev will point to CPU6 node.
> 
>  tail
> V--. v
>   ,-. <- ,-.,-.
>   |6||2||0|
>   `-'`-'`-'
>  `--^
> 
> At this point if CPU0 path's node->prev = prev is committed resulting
> in change of CPU0 prev back to CPU2 node. CPU2 node->next is NULL
> currently,
> 
>  tail
>v
> ,-. <- ,-. <- ,-.
> |6||2||0|
> `-'`-'`-'
>`--^
> 
> so if CPU0 gets into unqueue path of osq_lock it will keep spinning
> in infinite loop as condition prev->next == node will never be true.
> 
> Signed-off-by: Prateek Sood 
> ---
>  kernel/locking/osq_lock.c | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index a316794..9f4afa3 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -109,6 +109,19 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  
>   prev = decode_cpu(old);
>   node->prev = prev;
> +
> + /*
> +  * osq_lock()   unqueue
> +  *
> +  * node->prev = prevosq_wait_next()
> +  * WMB  MB
> +  * prev->next = nodenext->prev = prev //unqueue-C
> +  *
> +  * Here 'node->prev' and 'next->prev' are the same variable and we need
> +  * to ensure these stores happen in-order to avoid corrupting the list.
> +  */

The interested pattern/behavior remains somehow implicit in this snippet
(for example, as you described above, a load "reading from" the store to
prev->next is implicit in that osq_wait_next()); however I was unable to
come up with an alternative solution without complicating the comment.

Reviewed-by: Andrea Parri 

  Andrea


> + smp_wmb();
> +
>   WRITE_ONCE(prev->next, node);
>  
>   /*
> -- 
> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, 
> Inc., 
> is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
> 


Re: [PATCH RFC v2] Add /proc/pid/smaps_rollup

2017-08-09 Thread Minchan Kim
On Wed, Aug 09, 2017 at 05:15:57PM -0700, Daniel Colascione wrote:
> /proc/pid/smaps_rollup is a new proc file that improves the
> performance of user programs that determine aggregate memory
> statistics (e.g., total PSS) of a process.
> 
> Android regularly "samples" the memory usage of various processes in
> order to balance its memory pool sizes. This sampling process involves
> opening /proc/pid/smaps and summing certain fields. For very large
> processes, sampling memory use this way can take several hundred
> milliseconds, due mostly to the overhead of the seq_printf calls in
> task_mmu.c.
> 
> smaps_rollup improves the situation. It contains most of the fields of
> /proc/pid/smaps, but instead of a set of fields for each VMA,
> smaps_rollup instead contains one synthetic smaps-format entry
> representing the whole process. In the single smaps_rollup synthetic
> entry, each field is the summation of the corresponding field in all
> of the real-smaps VMAs. Using a common format for smaps_rollup and
> smaps allows userspace parsers to repurpose parsers meant for use with
> non-rollup smaps for smaps_rollup, and it allows userspace to switch
> between smaps_rollup and smaps at runtime (say, based on the
> availability of smaps_rollup in a given kernel) with minimal fuss.
> 
> By using smaps_rollup instead of smaps, a caller can avoid the
> significant overhead of formatting, reading, and parsing each of a
> large process's potentially very numerous memory mappings. For
> sampling system_server's PSS in Android, we measured a 12x speedup,
> representing a savings of several hundred milliseconds.
> 
> One alternative to a new per-process proc file would have been
> including PSS information in /proc/pid/status. We considered this
> option but thought that PSS would be too expensive (by a few orders of
> magnitude) to collect relative to what's already emitted as part of
> /proc/pid/status, and slowing every user of /proc/pid/status for the
> sake of readers that happen to want PSS feels wrong.
> 
> The code itself works by reusing the existing VMA-walking framework we
> use for regular smaps generation and keeping the mem_size_stats
> structure around between VMA walks instead of using a fresh one for
> each VMA.  In this way, summation happens automatically.  We let
> seq_file walk over the VMAs just as it does for regular smaps and just
> emit nothing to the seq_file until we hit the last VMA.
> 
> Patch changelog:
> 
> v2: Fix typo in commit message
> Add ABI documentation as requested by gregkh
> 
> Signed-off-by: Daniel Colascione 

I love this.

FYI, there was trial but got failed at that time so in this time,
https://marc.info/?l=linux-kernel=147310650003277=2
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1229163.html

I really hope we merge this patch.



Re: [PATCH RFC v2] Add /proc/pid/smaps_rollup

2017-08-09 Thread Minchan Kim
On Wed, Aug 09, 2017 at 05:15:57PM -0700, Daniel Colascione wrote:
> /proc/pid/smaps_rollup is a new proc file that improves the
> performance of user programs that determine aggregate memory
> statistics (e.g., total PSS) of a process.
> 
> Android regularly "samples" the memory usage of various processes in
> order to balance its memory pool sizes. This sampling process involves
> opening /proc/pid/smaps and summing certain fields. For very large
> processes, sampling memory use this way can take several hundred
> milliseconds, due mostly to the overhead of the seq_printf calls in
> task_mmu.c.
> 
> smaps_rollup improves the situation. It contains most of the fields of
> /proc/pid/smaps, but instead of a set of fields for each VMA,
> smaps_rollup instead contains one synthetic smaps-format entry
> representing the whole process. In the single smaps_rollup synthetic
> entry, each field is the summation of the corresponding field in all
> of the real-smaps VMAs. Using a common format for smaps_rollup and
> smaps allows userspace parsers to repurpose parsers meant for use with
> non-rollup smaps for smaps_rollup, and it allows userspace to switch
> between smaps_rollup and smaps at runtime (say, based on the
> availability of smaps_rollup in a given kernel) with minimal fuss.
> 
> By using smaps_rollup instead of smaps, a caller can avoid the
> significant overhead of formatting, reading, and parsing each of a
> large process's potentially very numerous memory mappings. For
> sampling system_server's PSS in Android, we measured a 12x speedup,
> representing a savings of several hundred milliseconds.
> 
> One alternative to a new per-process proc file would have been
> including PSS information in /proc/pid/status. We considered this
> option but thought that PSS would be too expensive (by a few orders of
> magnitude) to collect relative to what's already emitted as part of
> /proc/pid/status, and slowing every user of /proc/pid/status for the
> sake of readers that happen to want PSS feels wrong.
> 
> The code itself works by reusing the existing VMA-walking framework we
> use for regular smaps generation and keeping the mem_size_stats
> structure around between VMA walks instead of using a fresh one for
> each VMA.  In this way, summation happens automatically.  We let
> seq_file walk over the VMAs just as it does for regular smaps and just
> emit nothing to the seq_file until we hit the last VMA.
> 
> Patch changelog:
> 
> v2: Fix typo in commit message
> Add ABI documentation as requested by gregkh
> 
> Signed-off-by: Daniel Colascione 

I love this.

FYI, there was trial but got failed at that time so in this time,
https://marc.info/?l=linux-kernel=147310650003277=2
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1229163.html

I really hope we merge this patch.



[PATCH 2/2] cpufreq: schedutil: Always process remote callback with slow switching

2017-08-09 Thread Viresh Kumar
The frequency update from the utilization update handlers can be divided
into two parts:

(A) Finding the next frequency
(B) Updating the frequency

While any CPU can do (A), (B) can be restricted to a group of CPUs only,
depending on the current platform.

For platforms where fast cpufreq switching is possible, both (A) and (B)
are always done from the same CPU and that CPU should be capable of
changing the frequency of the target CPU.

But for platforms where fast cpufreq switching isn't possible, after
doing (A) we wake up a kthread which will eventually do (B). This
kthread is already bound to the right set of CPUs, i.e. only those which
can change the frequency of CPUs of a cpufreq policy. And so any CPU
can actually do (A) in this case, as the frequency is updated from the
right set of CPUs only.

Check cpufreq_can_do_remote_dvfs() only for the fast switching case.

Signed-off-by: Viresh Kumar 
---
 kernel/sched/cpufreq_schedutil.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 504d0752f8f2..cb21cb70e7dc 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -84,13 +84,18 @@ static bool sugov_should_update_freq(struct sugov_policy 
*sg_policy, u64 time)
 *
 * However, drivers cannot in general deal with cross-cpu
 * requests, so while get_next_freq() will work, our
-* sugov_update_commit() call may not.
+* sugov_update_commit() call may not for the fast switching platforms.
 *
 * Hence stop here for remote requests if they aren't supported
 * by the hardware, as calculating the frequency is pointless if
 * we cannot in fact act on it.
+*
+* For the slow switching platforms, the kthread is always scheduled on
+* the right set of CPUs and any CPU can find the next frequency and
+* schedule the kthread.
 */
-   if (!cpufreq_can_do_remote_dvfs(sg_policy->policy))
+   if (policy->fast_switch_enabled &&
+   !cpufreq_can_do_remote_dvfs(sg_policy->policy))
return false;
 
if (sg_policy->work_in_progress)
-- 
2.13.0.71.gd7076ec9c9cb



[PATCH 1/2] cpufreq: schedutil: Don't restrict kthread to related_cpus unnecessarily

2017-08-09 Thread Viresh Kumar
Utilization update callbacks are now processed remotely, even on the
CPUs that don't share cpufreq policy with the target CPU (if
dvfs_possible_from_any_cpu flag is set).

But in non-fast switch paths, the frequency is changed only from one of
policy->related_cpus. This happens because the kthread which does the
actual update is bound to a subset of CPUs (i.e. related_cpus).

Allow frequency to be remotely updated as well (i.e. call
__cpufreq_driver_target()) if dvfs_possible_from_any_cpu flag is set.

Reported-by: Pavan Kondeti 
Signed-off-by: Viresh Kumar 
---
 kernel/sched/cpufreq_schedutil.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 2e74c49776be..504d0752f8f2 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -487,7 +487,11 @@ static int sugov_kthread_create(struct sugov_policy 
*sg_policy)
}
 
sg_policy->thread = thread;
-   kthread_bind_mask(thread, policy->related_cpus);
+
+   /* Kthread is bound to all CPUs by default */
+   if (!policy->dvfs_possible_from_any_cpu)
+   kthread_bind_mask(thread, policy->related_cpus);
+
init_irq_work(_policy->irq_work, sugov_irq_work);
mutex_init(_policy->work_lock);
 
-- 
2.13.0.71.gd7076ec9c9cb



[PATCH 2/2] cpufreq: schedutil: Always process remote callback with slow switching

2017-08-09 Thread Viresh Kumar
The frequency update from the utilization update handlers can be divided
into two parts:

(A) Finding the next frequency
(B) Updating the frequency

While any CPU can do (A), (B) can be restricted to a group of CPUs only,
depending on the current platform.

For platforms where fast cpufreq switching is possible, both (A) and (B)
are always done from the same CPU and that CPU should be capable of
changing the frequency of the target CPU.

But for platforms where fast cpufreq switching isn't possible, after
doing (A) we wake up a kthread which will eventually do (B). This
kthread is already bound to the right set of CPUs, i.e. only those which
can change the frequency of CPUs of a cpufreq policy. And so any CPU
can actually do (A) in this case, as the frequency is updated from the
right set of CPUs only.

Check cpufreq_can_do_remote_dvfs() only for the fast switching case.

Signed-off-by: Viresh Kumar 
---
 kernel/sched/cpufreq_schedutil.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 504d0752f8f2..cb21cb70e7dc 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -84,13 +84,18 @@ static bool sugov_should_update_freq(struct sugov_policy 
*sg_policy, u64 time)
 *
 * However, drivers cannot in general deal with cross-cpu
 * requests, so while get_next_freq() will work, our
-* sugov_update_commit() call may not.
+* sugov_update_commit() call may not for the fast switching platforms.
 *
 * Hence stop here for remote requests if they aren't supported
 * by the hardware, as calculating the frequency is pointless if
 * we cannot in fact act on it.
+*
+* For the slow switching platforms, the kthread is always scheduled on
+* the right set of CPUs and any CPU can find the next frequency and
+* schedule the kthread.
 */
-   if (!cpufreq_can_do_remote_dvfs(sg_policy->policy))
+   if (policy->fast_switch_enabled &&
+   !cpufreq_can_do_remote_dvfs(sg_policy->policy))
return false;
 
if (sg_policy->work_in_progress)
-- 
2.13.0.71.gd7076ec9c9cb



[PATCH 1/2] cpufreq: schedutil: Don't restrict kthread to related_cpus unnecessarily

2017-08-09 Thread Viresh Kumar
Utilization update callbacks are now processed remotely, even on the
CPUs that don't share cpufreq policy with the target CPU (if
dvfs_possible_from_any_cpu flag is set).

But in non-fast switch paths, the frequency is changed only from one of
policy->related_cpus. This happens because the kthread which does the
actual update is bound to a subset of CPUs (i.e. related_cpus).

Allow frequency to be remotely updated as well (i.e. call
__cpufreq_driver_target()) if dvfs_possible_from_any_cpu flag is set.

Reported-by: Pavan Kondeti 
Signed-off-by: Viresh Kumar 
---
 kernel/sched/cpufreq_schedutil.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 2e74c49776be..504d0752f8f2 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -487,7 +487,11 @@ static int sugov_kthread_create(struct sugov_policy 
*sg_policy)
}
 
sg_policy->thread = thread;
-   kthread_bind_mask(thread, policy->related_cpus);
+
+   /* Kthread is bound to all CPUs by default */
+   if (!policy->dvfs_possible_from_any_cpu)
+   kthread_bind_mask(thread, policy->related_cpus);
+
init_irq_work(_policy->irq_work, sugov_irq_work);
mutex_init(_policy->work_lock);
 
-- 
2.13.0.71.gd7076ec9c9cb



Re: [lkp-robot] [mm] 7674270022: will-it-scale.per_process_ops -19.3% regression

2017-08-09 Thread Minchan Kim
On Wed, Aug 09, 2017 at 09:14:50PM -0700, Nadav Amit wrote:

Hi Nadav,

< snip >

> > According to the description it is "testcase:brk increase/decrease of 
> > one
> > page”. According to the mode it spawns multiple processes, not threads.
> > 
> > Since a single page is unmapped each time, and the iTLB-loads increase
> > dramatically, I would suspect that for some reason a full TLB flush is
> > caused during do_munmap().
> > 
> > If I find some free time, I’ll try to profile the workload - but feel 
> > free
> > to beat me to it.
>  
>  The root-cause appears to be that tlb_finish_mmu() does not call
>  dec_tlb_flush_pending() - as it should. Any chance you can take care of 
>  it?
> >>> 
> >>> Oops, but with second looking, it seems it's not my fault. ;-)
> >>> https://marc.info/?l=linux-mm=150156699114088=2
> >>> 
> >>> Anyway, thanks for the pointing out.
> >>> xiaolong.ye, could you retest with this fix?
> >> 
> >> I've queued tests for 5 times and results show this patch (e8f682574e4 "mm:
> >> decrease tlb flush pending count in tlb_finish_mmu") does help recover the
> >> performance back.
> >> 
> >> 378005bdbac0a2ec  76742700225cad9df49f053993  e8f682574e45b6406dadfffeb4  
> >>   --  --  
> >> %stddev  change %stddev  change %stddev
> >> \  |\  |\  
> >>   3405093 -19%2747088  -2%3348752
> >> will-it-scale.per_process_ops
> >>  1280 ±  3%-2%   1257 ±  3%-6%   1207
> >> vmstat.system.cs
> >>  2702 ± 18%11%   3002 ± 19%17%   3156 ± 18%  
> >> numa-vmstat.node0.nr_mapped
> >> 10765 ± 18%11%  11964 ± 19%17%  12588 ± 18%  
> >> numa-meminfo.node0.Mapped
> >>  0.00 ± 47%   -40%   0.00 ± 45%   -84%   0.00 ± 42%  
> >> mpstat.cpu.soft%
> >> 
> >> Thanks,
> >> Xiaolong
> > 
> > Thanks for the testing!
> 
> Sorry again for screwing your patch, Minchan.

Never mind! It always happens. :)
In this chance, I really appreciates your insight/testing/cooperation!


Re: [lkp-robot] [mm] 7674270022: will-it-scale.per_process_ops -19.3% regression

2017-08-09 Thread Minchan Kim
On Wed, Aug 09, 2017 at 09:14:50PM -0700, Nadav Amit wrote:

Hi Nadav,

< snip >

> > According to the description it is "testcase:brk increase/decrease of 
> > one
> > page”. According to the mode it spawns multiple processes, not threads.
> > 
> > Since a single page is unmapped each time, and the iTLB-loads increase
> > dramatically, I would suspect that for some reason a full TLB flush is
> > caused during do_munmap().
> > 
> > If I find some free time, I’ll try to profile the workload - but feel 
> > free
> > to beat me to it.
>  
>  The root-cause appears to be that tlb_finish_mmu() does not call
>  dec_tlb_flush_pending() - as it should. Any chance you can take care of 
>  it?
> >>> 
> >>> Oops, but with second looking, it seems it's not my fault. ;-)
> >>> https://marc.info/?l=linux-mm=150156699114088=2
> >>> 
> >>> Anyway, thanks for the pointing out.
> >>> xiaolong.ye, could you retest with this fix?
> >> 
> >> I've queued tests for 5 times and results show this patch (e8f682574e4 "mm:
> >> decrease tlb flush pending count in tlb_finish_mmu") does help recover the
> >> performance back.
> >> 
> >> 378005bdbac0a2ec  76742700225cad9df49f053993  e8f682574e45b6406dadfffeb4  
> >>   --  --  
> >> %stddev  change %stddev  change %stddev
> >> \  |\  |\  
> >>   3405093 -19%2747088  -2%3348752
> >> will-it-scale.per_process_ops
> >>  1280 ±  3%-2%   1257 ±  3%-6%   1207
> >> vmstat.system.cs
> >>  2702 ± 18%11%   3002 ± 19%17%   3156 ± 18%  
> >> numa-vmstat.node0.nr_mapped
> >> 10765 ± 18%11%  11964 ± 19%17%  12588 ± 18%  
> >> numa-meminfo.node0.Mapped
> >>  0.00 ± 47%   -40%   0.00 ± 45%   -84%   0.00 ± 42%  
> >> mpstat.cpu.soft%
> >> 
> >> Thanks,
> >> Xiaolong
> > 
> > Thanks for the testing!
> 
> Sorry again for screwing your patch, Minchan.

Never mind! It always happens. :)
In this chance, I really appreciates your insight/testing/cooperation!


Re: [lkp-robot] [mm] 7674270022: will-it-scale.per_process_ops -19.3% regression

2017-08-09 Thread Nadav Amit
Minchan Kim  wrote:

> On Wed, Aug 09, 2017 at 10:59:02AM +0800, Ye Xiaolong wrote:
>> On 08/08, Minchan Kim wrote:
>>> On Mon, Aug 07, 2017 at 10:51:00PM -0700, Nadav Amit wrote:
 Nadav Amit  wrote:
 
> Minchan Kim  wrote:
> 
>> Hi,
>> 
>> On Tue, Aug 08, 2017 at 09:19:23AM +0800, kernel test robot wrote:
>>> Greeting,
>>> 
>>> FYI, we noticed a -19.3% regression of will-it-scale.per_process_ops 
>>> due to commit:
>>> 
>>> 
>>> commit: 76742700225cad9df49f05399381ac3f1ec3dc60 ("mm: fix 
>>> MADV_[FREE|DONTNEED] TLB flush miss problem")
>>> url: 
>>> https://github.com/0day-ci/linux/commits/Nadav-Amit/mm-migrate-prevent-racy-access-to-tlb_flush_pending/20170802-205715
>>> 
>>> 
>>> in testcase: will-it-scale
>>> on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 
>>> with 64G memory
>>> with following parameters:
>>> 
>>> nr_task: 16
>>> mode: process
>>> test: brk1
>>> cpufreq_governor: performance
>>> 
>>> test-description: Will It Scale takes a testcase and runs it from 1 
>>> through to n parallel copies to see if the testcase will scale. It 
>>> builds both a process and threads based test in order to see any 
>>> differences between the two.
>>> test-url: https://github.com/antonblanchard/will-it-scale
>> 
>> Thanks for the report.
>> Could you explain what kinds of workload you are testing?
>> 
>> Does it calls frequently madvise(MADV_DONTNEED) in parallel on multiple
>> threads?
> 
> According to the description it is "testcase:brk increase/decrease of one
> page”. According to the mode it spawns multiple processes, not threads.
> 
> Since a single page is unmapped each time, and the iTLB-loads increase
> dramatically, I would suspect that for some reason a full TLB flush is
> caused during do_munmap().
> 
> If I find some free time, I’ll try to profile the workload - but feel free
> to beat me to it.
 
 The root-cause appears to be that tlb_finish_mmu() does not call
 dec_tlb_flush_pending() - as it should. Any chance you can take care of it?
>>> 
>>> Oops, but with second looking, it seems it's not my fault. ;-)
>>> https://marc.info/?l=linux-mm=150156699114088=2
>>> 
>>> Anyway, thanks for the pointing out.
>>> xiaolong.ye, could you retest with this fix?
>> 
>> I've queued tests for 5 times and results show this patch (e8f682574e4 "mm:
>> decrease tlb flush pending count in tlb_finish_mmu") does help recover the
>> performance back.
>> 
>> 378005bdbac0a2ec  76742700225cad9df49f053993  e8f682574e45b6406dadfffeb4  
>>   --  --  
>> %stddev  change %stddev  change %stddev
>> \  |\  |\  
>>   3405093 -19%2747088  -2%3348752
>> will-it-scale.per_process_ops
>>  1280 ±  3%-2%   1257 ±  3%-6%   1207
>> vmstat.system.cs
>>  2702 ± 18%11%   3002 ± 19%17%   3156 ± 18%  
>> numa-vmstat.node0.nr_mapped
>> 10765 ± 18%11%  11964 ± 19%17%  12588 ± 18%  
>> numa-meminfo.node0.Mapped
>>  0.00 ± 47%   -40%   0.00 ± 45%   -84%   0.00 ± 42%  
>> mpstat.cpu.soft%
>> 
>> Thanks,
>> Xiaolong
> 
> Thanks for the testing!

Sorry again for screwing your patch, Minchan.




Re: [lkp-robot] [mm] 7674270022: will-it-scale.per_process_ops -19.3% regression

2017-08-09 Thread Nadav Amit
Minchan Kim  wrote:

> On Wed, Aug 09, 2017 at 10:59:02AM +0800, Ye Xiaolong wrote:
>> On 08/08, Minchan Kim wrote:
>>> On Mon, Aug 07, 2017 at 10:51:00PM -0700, Nadav Amit wrote:
 Nadav Amit  wrote:
 
> Minchan Kim  wrote:
> 
>> Hi,
>> 
>> On Tue, Aug 08, 2017 at 09:19:23AM +0800, kernel test robot wrote:
>>> Greeting,
>>> 
>>> FYI, we noticed a -19.3% regression of will-it-scale.per_process_ops 
>>> due to commit:
>>> 
>>> 
>>> commit: 76742700225cad9df49f05399381ac3f1ec3dc60 ("mm: fix 
>>> MADV_[FREE|DONTNEED] TLB flush miss problem")
>>> url: 
>>> https://github.com/0day-ci/linux/commits/Nadav-Amit/mm-migrate-prevent-racy-access-to-tlb_flush_pending/20170802-205715
>>> 
>>> 
>>> in testcase: will-it-scale
>>> on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 
>>> with 64G memory
>>> with following parameters:
>>> 
>>> nr_task: 16
>>> mode: process
>>> test: brk1
>>> cpufreq_governor: performance
>>> 
>>> test-description: Will It Scale takes a testcase and runs it from 1 
>>> through to n parallel copies to see if the testcase will scale. It 
>>> builds both a process and threads based test in order to see any 
>>> differences between the two.
>>> test-url: https://github.com/antonblanchard/will-it-scale
>> 
>> Thanks for the report.
>> Could you explain what kinds of workload you are testing?
>> 
>> Does it calls frequently madvise(MADV_DONTNEED) in parallel on multiple
>> threads?
> 
> According to the description it is "testcase:brk increase/decrease of one
> page”. According to the mode it spawns multiple processes, not threads.
> 
> Since a single page is unmapped each time, and the iTLB-loads increase
> dramatically, I would suspect that for some reason a full TLB flush is
> caused during do_munmap().
> 
> If I find some free time, I’ll try to profile the workload - but feel free
> to beat me to it.
 
 The root-cause appears to be that tlb_finish_mmu() does not call
 dec_tlb_flush_pending() - as it should. Any chance you can take care of it?
>>> 
>>> Oops, but with second looking, it seems it's not my fault. ;-)
>>> https://marc.info/?l=linux-mm=150156699114088=2
>>> 
>>> Anyway, thanks for the pointing out.
>>> xiaolong.ye, could you retest with this fix?
>> 
>> I've queued tests for 5 times and results show this patch (e8f682574e4 "mm:
>> decrease tlb flush pending count in tlb_finish_mmu") does help recover the
>> performance back.
>> 
>> 378005bdbac0a2ec  76742700225cad9df49f053993  e8f682574e45b6406dadfffeb4  
>>   --  --  
>> %stddev  change %stddev  change %stddev
>> \  |\  |\  
>>   3405093 -19%2747088  -2%3348752
>> will-it-scale.per_process_ops
>>  1280 ±  3%-2%   1257 ±  3%-6%   1207
>> vmstat.system.cs
>>  2702 ± 18%11%   3002 ± 19%17%   3156 ± 18%  
>> numa-vmstat.node0.nr_mapped
>> 10765 ± 18%11%  11964 ± 19%17%  12588 ± 18%  
>> numa-meminfo.node0.Mapped
>>  0.00 ± 47%   -40%   0.00 ± 45%   -84%   0.00 ± 42%  
>> mpstat.cpu.soft%
>> 
>> Thanks,
>> Xiaolong
> 
> Thanks for the testing!

Sorry again for screwing your patch, Minchan.




Re: [lkp-robot] [mm] 7674270022: will-it-scale.per_process_ops -19.3% regression

2017-08-09 Thread Minchan Kim
On Wed, Aug 09, 2017 at 10:59:02AM +0800, Ye Xiaolong wrote:
> On 08/08, Minchan Kim wrote:
> >On Mon, Aug 07, 2017 at 10:51:00PM -0700, Nadav Amit wrote:
> >> Nadav Amit  wrote:
> >> 
> >> > Minchan Kim  wrote:
> >> > 
> >> >> Hi,
> >> >> 
> >> >> On Tue, Aug 08, 2017 at 09:19:23AM +0800, kernel test robot wrote:
> >> >>> Greeting,
> >> >>> 
> >> >>> FYI, we noticed a -19.3% regression of will-it-scale.per_process_ops 
> >> >>> due to commit:
> >> >>> 
> >> >>> 
> >> >>> commit: 76742700225cad9df49f05399381ac3f1ec3dc60 ("mm: fix 
> >> >>> MADV_[FREE|DONTNEED] TLB flush miss problem")
> >> >>> url: 
> >> >>> https://github.com/0day-ci/linux/commits/Nadav-Amit/mm-migrate-prevent-racy-access-to-tlb_flush_pending/20170802-205715
> >> >>> 
> >> >>> 
> >> >>> in testcase: will-it-scale
> >> >>> on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 
> >> >>> with 64G memory
> >> >>> with following parameters:
> >> >>> 
> >> >>>nr_task: 16
> >> >>>mode: process
> >> >>>test: brk1
> >> >>>cpufreq_governor: performance
> >> >>> 
> >> >>> test-description: Will It Scale takes a testcase and runs it from 1 
> >> >>> through to n parallel copies to see if the testcase will scale. It 
> >> >>> builds both a process and threads based test in order to see any 
> >> >>> differences between the two.
> >> >>> test-url: https://github.com/antonblanchard/will-it-scale
> >> >> 
> >> >> Thanks for the report.
> >> >> Could you explain what kinds of workload you are testing?
> >> >> 
> >> >> Does it calls frequently madvise(MADV_DONTNEED) in parallel on multiple
> >> >> threads?
> >> > 
> >> > According to the description it is "testcase:brk increase/decrease of one
> >> > page”. According to the mode it spawns multiple processes, not threads.
> >> > 
> >> > Since a single page is unmapped each time, and the iTLB-loads increase
> >> > dramatically, I would suspect that for some reason a full TLB flush is
> >> > caused during do_munmap().
> >> > 
> >> > If I find some free time, I’ll try to profile the workload - but feel 
> >> > free
> >> > to beat me to it.
> >> 
> >> The root-cause appears to be that tlb_finish_mmu() does not call
> >> dec_tlb_flush_pending() - as it should. Any chance you can take care of it?
> >
> >Oops, but with second looking, it seems it's not my fault. ;-)
> >https://marc.info/?l=linux-mm=150156699114088=2
> >
> >Anyway, thanks for the pointing out.
> >xiaolong.ye, could you retest with this fix?
> >
> 
> I've queued tests for 5 times and results show this patch (e8f682574e4 "mm:
> decrease tlb flush pending count in tlb_finish_mmu") does help recover the
> performance back.
> 
> 378005bdbac0a2ec  76742700225cad9df49f053993  e8f682574e45b6406dadfffeb4  
>   --  --  
>  %stddev  change %stddev  change %stddev
>  \  |\  |\  
>3405093 -19%2747088  -2%3348752
> will-it-scale.per_process_ops
>   1280 ±  3%-2%   1257 ±  3%-6%   1207
> vmstat.system.cs
>   2702 ± 18%11%   3002 ± 19%17%   3156 ± 18%  
> numa-vmstat.node0.nr_mapped
>  10765 ± 18%11%  11964 ± 19%17%  12588 ± 18%  
> numa-meminfo.node0.Mapped
>   0.00 ± 47%   -40%   0.00 ± 45%   -84%   0.00 ± 42%  
> mpstat.cpu.soft%
> 
> Thanks,
> Xiaolong

Thanks for the testing!


Re: [lkp-robot] [mm] 7674270022: will-it-scale.per_process_ops -19.3% regression

2017-08-09 Thread Minchan Kim
On Wed, Aug 09, 2017 at 10:59:02AM +0800, Ye Xiaolong wrote:
> On 08/08, Minchan Kim wrote:
> >On Mon, Aug 07, 2017 at 10:51:00PM -0700, Nadav Amit wrote:
> >> Nadav Amit  wrote:
> >> 
> >> > Minchan Kim  wrote:
> >> > 
> >> >> Hi,
> >> >> 
> >> >> On Tue, Aug 08, 2017 at 09:19:23AM +0800, kernel test robot wrote:
> >> >>> Greeting,
> >> >>> 
> >> >>> FYI, we noticed a -19.3% regression of will-it-scale.per_process_ops 
> >> >>> due to commit:
> >> >>> 
> >> >>> 
> >> >>> commit: 76742700225cad9df49f05399381ac3f1ec3dc60 ("mm: fix 
> >> >>> MADV_[FREE|DONTNEED] TLB flush miss problem")
> >> >>> url: 
> >> >>> https://github.com/0day-ci/linux/commits/Nadav-Amit/mm-migrate-prevent-racy-access-to-tlb_flush_pending/20170802-205715
> >> >>> 
> >> >>> 
> >> >>> in testcase: will-it-scale
> >> >>> on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 
> >> >>> with 64G memory
> >> >>> with following parameters:
> >> >>> 
> >> >>>nr_task: 16
> >> >>>mode: process
> >> >>>test: brk1
> >> >>>cpufreq_governor: performance
> >> >>> 
> >> >>> test-description: Will It Scale takes a testcase and runs it from 1 
> >> >>> through to n parallel copies to see if the testcase will scale. It 
> >> >>> builds both a process and threads based test in order to see any 
> >> >>> differences between the two.
> >> >>> test-url: https://github.com/antonblanchard/will-it-scale
> >> >> 
> >> >> Thanks for the report.
> >> >> Could you explain what kinds of workload you are testing?
> >> >> 
> >> >> Does it calls frequently madvise(MADV_DONTNEED) in parallel on multiple
> >> >> threads?
> >> > 
> >> > According to the description it is "testcase:brk increase/decrease of one
> >> > page”. According to the mode it spawns multiple processes, not threads.
> >> > 
> >> > Since a single page is unmapped each time, and the iTLB-loads increase
> >> > dramatically, I would suspect that for some reason a full TLB flush is
> >> > caused during do_munmap().
> >> > 
> >> > If I find some free time, I’ll try to profile the workload - but feel 
> >> > free
> >> > to beat me to it.
> >> 
> >> The root-cause appears to be that tlb_finish_mmu() does not call
> >> dec_tlb_flush_pending() - as it should. Any chance you can take care of it?
> >
> >Oops, but with second looking, it seems it's not my fault. ;-)
> >https://marc.info/?l=linux-mm=150156699114088=2
> >
> >Anyway, thanks for the pointing out.
> >xiaolong.ye, could you retest with this fix?
> >
> 
> I've queued tests for 5 times and results show this patch (e8f682574e4 "mm:
> decrease tlb flush pending count in tlb_finish_mmu") does help recover the
> performance back.
> 
> 378005bdbac0a2ec  76742700225cad9df49f053993  e8f682574e45b6406dadfffeb4  
>   --  --  
>  %stddev  change %stddev  change %stddev
>  \  |\  |\  
>3405093 -19%2747088  -2%3348752
> will-it-scale.per_process_ops
>   1280 ±  3%-2%   1257 ±  3%-6%   1207
> vmstat.system.cs
>   2702 ± 18%11%   3002 ± 19%17%   3156 ± 18%  
> numa-vmstat.node0.nr_mapped
>  10765 ± 18%11%  11964 ± 19%17%  12588 ± 18%  
> numa-meminfo.node0.Mapped
>   0.00 ± 47%   -40%   0.00 ± 45%   -84%   0.00 ± 42%  
> mpstat.cpu.soft%
> 
> Thanks,
> Xiaolong

Thanks for the testing!


Re: [PATCH v1 2/6] fs: use on-stack-bio if backing device has BDI_CAP_SYNC capability

2017-08-09 Thread Minchan Kim
On Wed, Aug 09, 2017 at 08:04:33PM -0700, Matthew Wilcox wrote:
> On Wed, Aug 09, 2017 at 11:41:50AM +0900, Minchan Kim wrote:
> > On Tue, Aug 08, 2017 at 07:31:22PM -0700, Matthew Wilcox wrote:
> > > On Wed, Aug 09, 2017 at 10:51:13AM +0900, Minchan Kim wrote:
> > > > On Tue, Aug 08, 2017 at 06:29:04AM -0700, Matthew Wilcox wrote:
> > > > > On Tue, Aug 08, 2017 at 05:49:59AM -0700, Matthew Wilcox wrote:
> > > > > > +   struct bio sbio;
> > > > > > +   struct bio_vec sbvec;
> > > > > 
> > > > > ... this needs to be sbvec[nr_pages], of course.
> > > > > 
> > > > > > -   bio = mpage_alloc(bdev, blocks[0] << (blkbits - 9),
> > > > > > +   if (bdi_cap_synchronous_io(inode_to_bdi(inode))) {
> > > > > > +   bio = 
> > > > > > +   bio_init(bio, , nr_pages);
> > > > > 
> > > > > ... and this needs to be 'sbvec', not ''.
> > > > 
> > > > I don't get it why we need sbvec[nr_pages].
> > > > On-stack-bio works with per-page.
> > > > May I miss something?
> > > 
> > > The way I redid it, it will work with an arbitrary number of pages.
> > 
> > IIUC, it would be good things with dynamic bio alloction with passing
> > allocated bio back and forth but on-stack bio cannot work like that.
> > It should be done in per-page so it is worth?
> 
> I'm not passing the bio back and forth between do_mpage_readpage() and
> its callers.  The version I sent allows for multiple pages in a single
> on-stack bio (when called from mpage_readpages()).

I'm confused. I want to confirm your thought before respinning.
Please correct me if I miss something.

The version you sent to me used on-stack bio within do_mpage_readpage
so that's why I said sbvec[nr_pages] would be pointless because it
works with per-page base unless if we use dynamic bio allocation.

But I guess now you suggest to use on-stack bio in mpage_readpages so
single on-stack bio in mpage_readpages's stack can batch multiple pages
in bvecs of a bio.

Right?



Re: [PATCH v1 2/6] fs: use on-stack-bio if backing device has BDI_CAP_SYNC capability

2017-08-09 Thread Minchan Kim
On Wed, Aug 09, 2017 at 08:04:33PM -0700, Matthew Wilcox wrote:
> On Wed, Aug 09, 2017 at 11:41:50AM +0900, Minchan Kim wrote:
> > On Tue, Aug 08, 2017 at 07:31:22PM -0700, Matthew Wilcox wrote:
> > > On Wed, Aug 09, 2017 at 10:51:13AM +0900, Minchan Kim wrote:
> > > > On Tue, Aug 08, 2017 at 06:29:04AM -0700, Matthew Wilcox wrote:
> > > > > On Tue, Aug 08, 2017 at 05:49:59AM -0700, Matthew Wilcox wrote:
> > > > > > +   struct bio sbio;
> > > > > > +   struct bio_vec sbvec;
> > > > > 
> > > > > ... this needs to be sbvec[nr_pages], of course.
> > > > > 
> > > > > > -   bio = mpage_alloc(bdev, blocks[0] << (blkbits - 9),
> > > > > > +   if (bdi_cap_synchronous_io(inode_to_bdi(inode))) {
> > > > > > +   bio = 
> > > > > > +   bio_init(bio, , nr_pages);
> > > > > 
> > > > > ... and this needs to be 'sbvec', not ''.
> > > > 
> > > > I don't get it why we need sbvec[nr_pages].
> > > > On-stack-bio works with per-page.
> > > > May I miss something?
> > > 
> > > The way I redid it, it will work with an arbitrary number of pages.
> > 
> > IIUC, it would be good things with dynamic bio alloction with passing
> > allocated bio back and forth but on-stack bio cannot work like that.
> > It should be done in per-page so it is worth?
> 
> I'm not passing the bio back and forth between do_mpage_readpage() and
> its callers.  The version I sent allows for multiple pages in a single
> on-stack bio (when called from mpage_readpages()).

I'm confused. I want to confirm your thought before respinning.
Please correct me if I miss something.

The version you sent to me used on-stack bio within do_mpage_readpage
so that's why I said sbvec[nr_pages] would be pointless because it
works with per-page base unless if we use dynamic bio allocation.

But I guess now you suggest to use on-stack bio in mpage_readpages so
single on-stack bio in mpage_readpages's stack can batch multiple pages
in bvecs of a bio.

Right?



Re: [PATCH] f2fs: introduce cur_reserved_blocks in sysfs

2017-08-09 Thread Yunlong Song
I think the aim of reserved_blocks function is to leave space for f2fs 
and FTL, so I change it to a
soft version so that it can be used to fit to the data image which does 
not satisfy the hard version,
especially for backward compatibility when updated kernel with new 
default reserved_blocks set
(currently it is initially set 0 as default, but can be set any value 
with my new patch).


As for the uid/gid, does current f2fs space management design consider 
this issue? IMO, I think we
can just ensure the reserved space for file system no matter user/system 
type. Whether the value of
reserved_blocks is OK or not, should not be filesystem's issue. 
Filesystem just provide this interface,
and the upper layer, such as android vold should take care of the value 
of reserved_blocks and make
sure its value is appropriate and will not block any activation of 
system user, if it really happens, android
should change the value dynamically, it is fine, since we make 
reserved_blocks to a soft version.


On 2017/8/10 11:15, Chao Yu wrote:

On 2017/8/8 21:43, Yunlong Song wrote:

In this patch, we add a new sysfs interface, we can use it to gradually achieve
the reserved_blocks finally, even when reserved_blocks is initially set over
user_block_count - total_valid_block_count. This is very useful, especially when
we upgrade kernel with new reserved_blocks value, but old disk image unluckily 
has
user_block_count - total_valid_block_count smaller than the desired 
reserved_blocks.
With this patch, f2fs can try its best to reserve space and get close to the
reserved_blocks, and the current value of achieved reserved_blocks can be shown
in real time.

Oh, this looks like a soft limitation in quota system, but original
reserved_blocks implementation likes a hard one, so this patch changes the
semantics of reserved_blocks.

Actually, I doubt that it would be hard to reserve all left free space in real
user scenario now, since system user's activation may depend on free space of
data partition due to file creation requirement, so w/o supporting feature of
uid/gid reserved block, soft reservation will block any activation of system
user, such as android.

Thanks,


Signed-off-by: Yunlong Song 
---
  Documentation/ABI/testing/sysfs-fs-f2fs |  6 ++
  fs/f2fs/f2fs.h  |  9 +++--
  fs/f2fs/super.c |  4 +++-
  fs/f2fs/sysfs.c | 15 ++-
  4 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs 
b/Documentation/ABI/testing/sysfs-fs-f2fs
index 11b7f4e..bdbb9f3 100644
--- a/Documentation/ABI/testing/sysfs-fs-f2fs
+++ b/Documentation/ABI/testing/sysfs-fs-f2fs
@@ -151,3 +151,9 @@ Date:   August 2017
  Contact:  "Jaegeuk Kim" 
  Description:
 Controls sleep time of GC urgent mode
+
+What:  /sys/fs/f2fs//cur_reserved_blocks
+Date:  August 2017
+Contact:   "Yunlong Song" 
+Description:
+Shows current reserved blocks in system.
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index cea329f..3b7056f 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1040,6 +1040,7 @@ struct f2fs_sb_info {
block_t discard_blks;   /* discard command candidats */
block_t last_valid_block_count; /* for recovery */
block_t reserved_blocks;/* configurable reserved blocks 
*/
+   block_t cur_reserved_blocks;/* current reserved blocks */
  
  	u32 s_next_generation;			/* for NFS support */
  
@@ -1514,7 +1515,7 @@ static inline int inc_valid_block_count(struct f2fs_sb_info *sbi,
  
  	spin_lock(>stat_lock);

sbi->total_valid_block_count += (block_t)(*count);
-   avail_user_block_count = sbi->user_block_count - sbi->reserved_blocks;
+   avail_user_block_count = sbi->user_block_count - 
sbi->cur_reserved_blocks;
if (unlikely(sbi->total_valid_block_count > avail_user_block_count)) {
diff = sbi->total_valid_block_count - avail_user_block_count;
*count -= diff;
@@ -1548,6 +1549,8 @@ static inline void dec_valid_block_count(struct 
f2fs_sb_info *sbi,
f2fs_bug_on(sbi, sbi->total_valid_block_count < (block_t) count);
f2fs_bug_on(sbi, inode->i_blocks < sectors);
sbi->total_valid_block_count -= (block_t)count;
+   sbi->cur_reserved_blocks = min(sbi->reserved_blocks,
+   
sbi->cur_reserved_blocks + count);
spin_unlock(>stat_lock);
f2fs_i_blocks_write(inode, count, false, true);
  }
@@ -1694,7 +1697,7 @@ static inline int inc_valid_node_count(struct 
f2fs_sb_info *sbi,
spin_lock(>stat_lock);
  
  	valid_block_count = sbi->total_valid_block_count + 1;

-   if (unlikely(valid_block_count + sbi->reserved_blocks >
+   if (unlikely(valid_block_count 

Re: [PATCH] f2fs: introduce cur_reserved_blocks in sysfs

2017-08-09 Thread Yunlong Song
I think the aim of reserved_blocks function is to leave space for f2fs 
and FTL, so I change it to a
soft version so that it can be used to fit to the data image which does 
not satisfy the hard version,
especially for backward compatibility when updated kernel with new 
default reserved_blocks set
(currently it is initially set 0 as default, but can be set any value 
with my new patch).


As for the uid/gid, does current f2fs space management design consider 
this issue? IMO, I think we
can just ensure the reserved space for file system no matter user/system 
type. Whether the value of
reserved_blocks is OK or not, should not be filesystem's issue. 
Filesystem just provide this interface,
and the upper layer, such as android vold should take care of the value 
of reserved_blocks and make
sure its value is appropriate and will not block any activation of 
system user, if it really happens, android
should change the value dynamically, it is fine, since we make 
reserved_blocks to a soft version.


On 2017/8/10 11:15, Chao Yu wrote:

On 2017/8/8 21:43, Yunlong Song wrote:

In this patch, we add a new sysfs interface, we can use it to gradually achieve
the reserved_blocks finally, even when reserved_blocks is initially set over
user_block_count - total_valid_block_count. This is very useful, especially when
we upgrade kernel with new reserved_blocks value, but old disk image unluckily 
has
user_block_count - total_valid_block_count smaller than the desired 
reserved_blocks.
With this patch, f2fs can try its best to reserve space and get close to the
reserved_blocks, and the current value of achieved reserved_blocks can be shown
in real time.

Oh, this looks like a soft limitation in quota system, but original
reserved_blocks implementation likes a hard one, so this patch changes the
semantics of reserved_blocks.

Actually, I doubt that it would be hard to reserve all left free space in real
user scenario now, since system user's activation may depend on free space of
data partition due to file creation requirement, so w/o supporting feature of
uid/gid reserved block, soft reservation will block any activation of system
user, such as android.

Thanks,


Signed-off-by: Yunlong Song 
---
  Documentation/ABI/testing/sysfs-fs-f2fs |  6 ++
  fs/f2fs/f2fs.h  |  9 +++--
  fs/f2fs/super.c |  4 +++-
  fs/f2fs/sysfs.c | 15 ++-
  4 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs 
b/Documentation/ABI/testing/sysfs-fs-f2fs
index 11b7f4e..bdbb9f3 100644
--- a/Documentation/ABI/testing/sysfs-fs-f2fs
+++ b/Documentation/ABI/testing/sysfs-fs-f2fs
@@ -151,3 +151,9 @@ Date:   August 2017
  Contact:  "Jaegeuk Kim" 
  Description:
 Controls sleep time of GC urgent mode
+
+What:  /sys/fs/f2fs//cur_reserved_blocks
+Date:  August 2017
+Contact:   "Yunlong Song" 
+Description:
+Shows current reserved blocks in system.
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index cea329f..3b7056f 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1040,6 +1040,7 @@ struct f2fs_sb_info {
block_t discard_blks;   /* discard command candidats */
block_t last_valid_block_count; /* for recovery */
block_t reserved_blocks;/* configurable reserved blocks 
*/
+   block_t cur_reserved_blocks;/* current reserved blocks */
  
  	u32 s_next_generation;			/* for NFS support */
  
@@ -1514,7 +1515,7 @@ static inline int inc_valid_block_count(struct f2fs_sb_info *sbi,
  
  	spin_lock(>stat_lock);

sbi->total_valid_block_count += (block_t)(*count);
-   avail_user_block_count = sbi->user_block_count - sbi->reserved_blocks;
+   avail_user_block_count = sbi->user_block_count - 
sbi->cur_reserved_blocks;
if (unlikely(sbi->total_valid_block_count > avail_user_block_count)) {
diff = sbi->total_valid_block_count - avail_user_block_count;
*count -= diff;
@@ -1548,6 +1549,8 @@ static inline void dec_valid_block_count(struct 
f2fs_sb_info *sbi,
f2fs_bug_on(sbi, sbi->total_valid_block_count < (block_t) count);
f2fs_bug_on(sbi, inode->i_blocks < sectors);
sbi->total_valid_block_count -= (block_t)count;
+   sbi->cur_reserved_blocks = min(sbi->reserved_blocks,
+   
sbi->cur_reserved_blocks + count);
spin_unlock(>stat_lock);
f2fs_i_blocks_write(inode, count, false, true);
  }
@@ -1694,7 +1697,7 @@ static inline int inc_valid_node_count(struct 
f2fs_sb_info *sbi,
spin_lock(>stat_lock);
  
  	valid_block_count = sbi->total_valid_block_count + 1;

-   if (unlikely(valid_block_count + sbi->reserved_blocks >
+   if (unlikely(valid_block_count + sbi->cur_reserved_blocks >
  

Re: [PATCH v4 05/12] Documentation: net: phy: Add phy-is-internal binding

2017-08-09 Thread Chen-Yu Tsai
On Thu, Aug 10, 2017 at 8:20 AM, Andrew Lunn  wrote:
> On Wed, Aug 09, 2017 at 03:47:34PM -0700, Florian Fainelli wrote:
>> On August 9, 2017 5:10:30 AM PDT, David Wu  wrote:
>> >Add the documentation for internal phy. A boolean property
>> >indicates that a internal phy will be used.
>> >
>> >Signed-off-by: David Wu 
>> >---
>> > Documentation/devicetree/bindings/net/phy.txt | 3 +++
>> > 1 file changed, 3 insertions(+)
>> >
>> >diff --git a/Documentation/devicetree/bindings/net/phy.txt
>> >b/Documentation/devicetree/bindings/net/phy.txt
>> >index b558576..942c892 100644
>> >--- a/Documentation/devicetree/bindings/net/phy.txt
>> >+++ b/Documentation/devicetree/bindings/net/phy.txt
>> >@@ -52,6 +52,9 @@ Optional Properties:
>> >   Mark the corresponding energy efficient ethernet mode as broken and
>> >   request the ethernet to stop advertising it.
>> >
>> >+- phy-is-internal: If set, indicates that phy will connect to the MAC
>> >as a
>> >+  internal phy.
>>
>> Something along the lines of:
>>
>> If set, indicates that the PHY is integrated into the same physical package 
>> as the Ethernet MAC.
>
> Hi Florian, David.
>
> I'm happy with the property name. But i think the text needs more
> description. We deal with Ethernet switches with integrated PHYs. Yet
> for us, this property is unneeded.
>
> Seeing this property means some bit of software needs to ensure the
> internal PHY should be used, when given the choice between an internal
> and external PHY. So i would say something like:
>
> If set, indicates that the PHY is integrated into the same
> physical package as the Ethernet MAC. If needed, muxers should be
> configured to ensure the internal PHY is used. The absence of this
> property indicates the muxers should be configured so that the
> external PHY is used.
>
> This last part is important. If the bootloader has set the internal
> PHY to be used, you need to reset it. Otherwise we are going to get
> into a mess sometime later and need to add a phy-is-external property.

Ack.

One other thing. We need to fix our (sunxi) binding which is already
in 4.13-rc1. We'd like to see this new property in netdev, i.e. merged
for 4.13, so we can use it.

Thanks
ChenYu


Re: [PATCH v4 05/12] Documentation: net: phy: Add phy-is-internal binding

2017-08-09 Thread Chen-Yu Tsai
On Thu, Aug 10, 2017 at 8:20 AM, Andrew Lunn  wrote:
> On Wed, Aug 09, 2017 at 03:47:34PM -0700, Florian Fainelli wrote:
>> On August 9, 2017 5:10:30 AM PDT, David Wu  wrote:
>> >Add the documentation for internal phy. A boolean property
>> >indicates that a internal phy will be used.
>> >
>> >Signed-off-by: David Wu 
>> >---
>> > Documentation/devicetree/bindings/net/phy.txt | 3 +++
>> > 1 file changed, 3 insertions(+)
>> >
>> >diff --git a/Documentation/devicetree/bindings/net/phy.txt
>> >b/Documentation/devicetree/bindings/net/phy.txt
>> >index b558576..942c892 100644
>> >--- a/Documentation/devicetree/bindings/net/phy.txt
>> >+++ b/Documentation/devicetree/bindings/net/phy.txt
>> >@@ -52,6 +52,9 @@ Optional Properties:
>> >   Mark the corresponding energy efficient ethernet mode as broken and
>> >   request the ethernet to stop advertising it.
>> >
>> >+- phy-is-internal: If set, indicates that phy will connect to the MAC
>> >as a
>> >+  internal phy.
>>
>> Something along the lines of:
>>
>> If set, indicates that the PHY is integrated into the same physical package 
>> as the Ethernet MAC.
>
> Hi Florian, David.
>
> I'm happy with the property name. But i think the text needs more
> description. We deal with Ethernet switches with integrated PHYs. Yet
> for us, this property is unneeded.
>
> Seeing this property means some bit of software needs to ensure the
> internal PHY should be used, when given the choice between an internal
> and external PHY. So i would say something like:
>
> If set, indicates that the PHY is integrated into the same
> physical package as the Ethernet MAC. If needed, muxers should be
> configured to ensure the internal PHY is used. The absence of this
> property indicates the muxers should be configured so that the
> external PHY is used.
>
> This last part is important. If the bootloader has set the internal
> PHY to be used, you need to reset it. Otherwise we are going to get
> into a mess sometime later and need to add a phy-is-external property.

Ack.

One other thing. We need to fix our (sunxi) binding which is already
in 4.13-rc1. We'd like to see this new property in netdev, i.e. merged
for 4.13, so we can use it.

Thanks
ChenYu


Re: [linux-sunxi] [PATCH 1/3] arm64: allwinner: a64: add ethernet0 alias for BPi M64 EMAC node

2017-08-09 Thread Chen-Yu Tsai
Hi,

On Sat, Jul 22, 2017 at 10:28 AM, Icenowy Zheng  wrote:
> The Banana Pi M64 board uses the A64 chip's EMAC to provide Ethernet
> link.
>
> Add the ethernet0 alias in the device tree, in order to let U-Boot
> generate a MAC address from the chip's SID.
>
> Signed-off-by: Icenowy Zheng 

As mentioned in the discussion of the cover letter of this series,
we'd really like to move this to fixes for 4.13.

I'd like to move forward on this soon. Can I just do a wholesale
rewrite of the commit message along the lines of the following
example, and move the 3 patches to fixes for 4.13?

arm64: allwinner: a64: bananapi-m64: add missing ethernet0 alias

The EMAC Ethernet controller was enabled, but an accompanying alias
was not added. This results in unstable numbering if other Ethernet
devices, such as a USB dongle, are present. Also, the bootloader uses
the alias to assign a generated stable MAC address to the device node.

Fixes: e7295499903d ("arm64: allwinner: bananapi-m64: Enable dwmac-sun8i")


Thanks
ChenYu


Re: [linux-sunxi] [PATCH 1/3] arm64: allwinner: a64: add ethernet0 alias for BPi M64 EMAC node

2017-08-09 Thread Chen-Yu Tsai
Hi,

On Sat, Jul 22, 2017 at 10:28 AM, Icenowy Zheng  wrote:
> The Banana Pi M64 board uses the A64 chip's EMAC to provide Ethernet
> link.
>
> Add the ethernet0 alias in the device tree, in order to let U-Boot
> generate a MAC address from the chip's SID.
>
> Signed-off-by: Icenowy Zheng 

As mentioned in the discussion of the cover letter of this series,
we'd really like to move this to fixes for 4.13.

I'd like to move forward on this soon. Can I just do a wholesale
rewrite of the commit message along the lines of the following
example, and move the 3 patches to fixes for 4.13?

arm64: allwinner: a64: bananapi-m64: add missing ethernet0 alias

The EMAC Ethernet controller was enabled, but an accompanying alias
was not added. This results in unstable numbering if other Ethernet
devices, such as a USB dongle, are present. Also, the bootloader uses
the alias to assign a generated stable MAC address to the device node.

Fixes: e7295499903d ("arm64: allwinner: bananapi-m64: Enable dwmac-sun8i")


Thanks
ChenYu


Re: [PATCH] iio: accel: Bugfix to enbale and allow different events to work parallely.

2017-08-09 Thread Harinath Nampally

My only suggestion for adding all these chips' orientation features, is
to start the discussion independently from this driver. Are there other
device series that provide such an orientation interrupt? Is it worth
finding a representation in iio?
Given the number of accelerometers these days have built in orientation 
event support,


I think its worth to have a representation in IIO,


Additionally to portait up/down, landscape left/right there is
back/front facing, so you'd have 8 new channel modifiers.
Yes that's correct but I wonder if its good idea to add 8(too many!) new 
channel modifiers.

If IIO_ROT is a current userspace "standard" to read for rotating the
screen, it may be worth discussing how to fit this in without new
modifiers. Would you have to make up fake angle values? Anything else
userspace already uses for getting the orientation?

Yes I agree, I don't think I need to make up fake angle values, not sure
how userspace gets orientation currently. Need to do some research on that.

But again, instead of replying here and going off topic, write up a
proposal and post it independently.

Sure will do that. Thanks for your response.

On 08/01/2017 11:50 AM, Martin Kepplinger wrote:

On 2017-08-01 05:08, Harinath Nampally wrote:

Thanks for doing that work. I have had it on my list for a long time
and you seem to fix it. Although I'd happily review and possibly test
it, unfortunately I can't do so before the week of August 21st.

If this might go in quick, nothing will stop me from reviewing either,
so, whatever. Thanks again!

   Sure no problem, looking forward to your review comments.
   Actually I am planning to add Orientation events for FXLS8471Q, for
that is it good idea to overload existing
   IIO_ROT channel type? Also thinking of adding 4 channel modifiers i.e
portrait up/down, landscape left/right.
   Any suggestions are welcome. Thank you.


My only suggestion for adding all these chips' orientation features, is
to start the discussion independently from this driver. Are there other
device series that provide such an orientation interrupt? Is it worth
finding a representation in iio?

Additionally to portait up/down, landscape left/right there is
back/front facing, so you'd have 8 new channel modifiers.

If IIO_ROT is a current userspace "standard" to read for rotating the
screen, it may be worth discussing how to fit this in without new
modifiers. Would you have to make up fake angle values? Anything else
userspace already uses for getting the orientation?

But again, instead of replying here and going off topic, write up a
proposal and post it independently.




Re: [PATCH] iio: accel: Bugfix to enbale and allow different events to work parallely.

2017-08-09 Thread Harinath Nampally

My only suggestion for adding all these chips' orientation features, is
to start the discussion independently from this driver. Are there other
device series that provide such an orientation interrupt? Is it worth
finding a representation in iio?
Given the number of accelerometers these days have built in orientation 
event support,


I think its worth to have a representation in IIO,


Additionally to portait up/down, landscape left/right there is
back/front facing, so you'd have 8 new channel modifiers.
Yes that's correct but I wonder if its good idea to add 8(too many!) new 
channel modifiers.

If IIO_ROT is a current userspace "standard" to read for rotating the
screen, it may be worth discussing how to fit this in without new
modifiers. Would you have to make up fake angle values? Anything else
userspace already uses for getting the orientation?

Yes I agree, I don't think I need to make up fake angle values, not sure
how userspace gets orientation currently. Need to do some research on that.

But again, instead of replying here and going off topic, write up a
proposal and post it independently.

Sure will do that. Thanks for your response.

On 08/01/2017 11:50 AM, Martin Kepplinger wrote:

On 2017-08-01 05:08, Harinath Nampally wrote:

Thanks for doing that work. I have had it on my list for a long time
and you seem to fix it. Although I'd happily review and possibly test
it, unfortunately I can't do so before the week of August 21st.

If this might go in quick, nothing will stop me from reviewing either,
so, whatever. Thanks again!

   Sure no problem, looking forward to your review comments.
   Actually I am planning to add Orientation events for FXLS8471Q, for
that is it good idea to overload existing
   IIO_ROT channel type? Also thinking of adding 4 channel modifiers i.e
portrait up/down, landscape left/right.
   Any suggestions are welcome. Thank you.


My only suggestion for adding all these chips' orientation features, is
to start the discussion independently from this driver. Are there other
device series that provide such an orientation interrupt? Is it worth
finding a representation in iio?

Additionally to portait up/down, landscape left/right there is
back/front facing, so you'd have 8 new channel modifiers.

If IIO_ROT is a current userspace "standard" to read for rotating the
screen, it may be worth discussing how to fit this in without new
modifiers. Would you have to make up fake angle values? Anything else
userspace already uses for getting the orientation?

But again, instead of replying here and going off topic, write up a
proposal and post it independently.




Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature

2017-08-09 Thread Byungchul Park
On Thu, Aug 10, 2017 at 09:55:56AM +0900, Byungchul Park wrote:
> On Wed, Aug 09, 2017 at 05:50:59PM +0200, Peter Zijlstra wrote:
> > 
> > 
> > Heh, look what it does...
> 
> It does not happen in my machine..
> 
> I tihink it happens because of "Simplify xhlock ring buffer invalidation"
> patch of you.
> 
> First of all, could you reverse yours and check if it happens, too?
> If not, we have to think the simplification more.
> 
> BTW, does your patch consider the possibility that a worker and irqs can
> be nested? Is it no problem even in the case?

In addition, now that each syscall context is isolated by your suggestion
with crossrelease_hist_end() and crossrelease_hist_start(), contexts can
be nested easily. I want to keep my patches unchanged at first and change
code carefully.

> 
> > 
> > 
> > 4==
> > 4WARNING: possible circular locking dependency detected
> > 4.13.0-rc2-00317-gadc6764a3adf-dirty #797 Tainted: GW  
> > 4--
> > 4startpar/582 is trying to acquire lock:
> > c (c(complete)>donec){+.+.}c, at: [] 
> > flush_work+0x1fd/0x2c0
> > 4
> > but task is already holding lock:
> > c (clockc#3c){+.+.}c, at: [] 
> > lru_add_drain_all_cpuslocked+0x46/0x1a0
> > 4
> > which lock already depends on the new lock.
> > 
> > 4
> > the existing dependency chain (in reverse order) is:
> > 
> > -> #4c (clockc#3c){+.+.}c:
> >__lock_acquire+0x10a5/0x1100
> >lock_acquire+0xea/0x1f0
> >__mutex_lock+0x6c/0x960
> >mutex_lock_nested+0x1b/0x20
> >lru_add_drain_all_cpuslocked+0x46/0x1a0
> >lru_add_drain_all+0x13/0x20
> >SyS_mlockall+0xb8/0x1c0
> >entry_SYSCALL_64_fastpath+0x23/0xc2
> > 
> > -> #3c (ccpu_hotplug_lock.rw_semc){}c:
> >__lock_acquire+0x10a5/0x1100
> >lock_acquire+0xea/0x1f0
> >cpus_read_lock+0x2a/0x90
> >kmem_cache_create+0x2a/0x1d0
> >scsi_init_sense_cache+0xa0/0xc0
> >scsi_add_host_with_dma+0x67/0x360
> >isci_pci_probe+0x873/0xc90
> >local_pci_probe+0x42/0xa0
> >work_for_cpu_fn+0x14/0x20
> >process_one_work+0x273/0x6b0
> >worker_thread+0x21b/0x3f0
> >kthread+0x147/0x180
> >ret_from_fork+0x2a/0x40
> > 
> > -> #2c (cscsi_sense_cache_mutexc){+.+.}c:
> >__lock_acquire+0x10a5/0x1100
> >lock_acquire+0xea/0x1f0
> >__mutex_lock+0x6c/0x960
> >mutex_lock_nested+0x1b/0x20
> >scsi_init_sense_cache+0x3d/0xc0
> >scsi_add_host_with_dma+0x67/0x360
> >isci_pci_probe+0x873/0xc90
> >local_pci_probe+0x42/0xa0
> >work_for_cpu_fn+0x14/0x20
> >process_one_work+0x273/0x6b0
> >worker_thread+0x21b/0x3f0
> >kthread+0x147/0x180
> >ret_from_fork+0x2a/0x40
> > 
> > -> #1c (c()c){+.+.}c:
> >process_one_work+0x244/0x6b0
> >worker_thread+0x21b/0x3f0
> >kthread+0x147/0x180
> >ret_from_fork+0x2a/0x40
> >0x
> > 
> > -> #0c (c(complete)>donec){+.+.}c:
> >check_prev_add+0x3be/0x700
> >__lock_acquire+0x10a5/0x1100
> >lock_acquire+0xea/0x1f0
> >wait_for_completion+0x3b/0x130
> >flush_work+0x1fd/0x2c0
> >lru_add_drain_all_cpuslocked+0x158/0x1a0
> >lru_add_drain_all+0x13/0x20
> >SyS_mlockall+0xb8/0x1c0
> >entry_SYSCALL_64_fastpath+0x23/0xc2
> > 
> > other info that might help us debug this:
> > 
> > Chain exists of:
> >   c(complete)>donec --> ccpu_hotplug_lock.rw_semc --> clockc#3c
> > 
> >  Possible unsafe locking scenario:
> > 
> >CPU0CPU1
> >
> >   lock(clockc#3c);
> >lock(ccpu_hotplug_lock.rw_semc);
> >lock(clockc#3c);
> >   lock(c(complete)>donec);
> > 
> >  *** DEADLOCK ***
> > 
> > 2 locks held by startpar/582:
> >  #0: c (ccpu_hotplug_lock.rw_semc){}c, at: [] 
> > lru_add_drain_all+0xe/0x20
> >  #1: c (clockc#3c){+.+.}c, at: [] 
> > lru_add_drain_all_cpuslocked+0x46/0x1a0
> > 
> > stack backtrace:
> > dCPU: 23 PID: 582 Comm: startpar Tainted: GW   
> > 4.13.0-rc2-00317-gadc6764a3adf-dirty #797
> > dHardware name: Intel Corporation S2600GZ/S2600GZ, BIOS 
> > SE5C600.86B.02.02.0002.122320131210 12/23/2013
> > dCall Trace:
> > d dump_stack+0x86/0xcf
> > d print_circular_bug+0x203/0x2f0
> > d check_prev_add+0x3be/0x700
> > d ? add_lock_to_list.isra.30+0xc0/0xc0
> > d ? is_bpf_text_address+0x82/0xe0
> > d ? unwind_get_return_address+0x1f/0x30
> > d __lock_acquire+0x10a5/0x1100
> > d ? __lock_acquire+0x10a5/0x1100
> > d ? add_lock_to_list.isra.30+0xc0/0xc0
> > d lock_acquire+0xea/0x1f0
> > d ? flush_work+0x1fd/0x2c0
> > d wait_for_completion+0x3b/0x130
> > d ? flush_work+0x1fd/0x2c0
> > d flush_work+0x1fd/0x2c0
> > d ? flush_workqueue_prep_pwqs+0x1c0/0x1c0
> > d ? trace_hardirqs_on+0xd/0x10

Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature

2017-08-09 Thread Byungchul Park
On Thu, Aug 10, 2017 at 09:55:56AM +0900, Byungchul Park wrote:
> On Wed, Aug 09, 2017 at 05:50:59PM +0200, Peter Zijlstra wrote:
> > 
> > 
> > Heh, look what it does...
> 
> It does not happen in my machine..
> 
> I tihink it happens because of "Simplify xhlock ring buffer invalidation"
> patch of you.
> 
> First of all, could you reverse yours and check if it happens, too?
> If not, we have to think the simplification more.
> 
> BTW, does your patch consider the possibility that a worker and irqs can
> be nested? Is it no problem even in the case?

In addition, now that each syscall context is isolated by your suggestion
with crossrelease_hist_end() and crossrelease_hist_start(), contexts can
be nested easily. I want to keep my patches unchanged at first and change
code carefully.

> 
> > 
> > 
> > 4==
> > 4WARNING: possible circular locking dependency detected
> > 4.13.0-rc2-00317-gadc6764a3adf-dirty #797 Tainted: GW  
> > 4--
> > 4startpar/582 is trying to acquire lock:
> > c (c(complete)>donec){+.+.}c, at: [] 
> > flush_work+0x1fd/0x2c0
> > 4
> > but task is already holding lock:
> > c (clockc#3c){+.+.}c, at: [] 
> > lru_add_drain_all_cpuslocked+0x46/0x1a0
> > 4
> > which lock already depends on the new lock.
> > 
> > 4
> > the existing dependency chain (in reverse order) is:
> > 
> > -> #4c (clockc#3c){+.+.}c:
> >__lock_acquire+0x10a5/0x1100
> >lock_acquire+0xea/0x1f0
> >__mutex_lock+0x6c/0x960
> >mutex_lock_nested+0x1b/0x20
> >lru_add_drain_all_cpuslocked+0x46/0x1a0
> >lru_add_drain_all+0x13/0x20
> >SyS_mlockall+0xb8/0x1c0
> >entry_SYSCALL_64_fastpath+0x23/0xc2
> > 
> > -> #3c (ccpu_hotplug_lock.rw_semc){}c:
> >__lock_acquire+0x10a5/0x1100
> >lock_acquire+0xea/0x1f0
> >cpus_read_lock+0x2a/0x90
> >kmem_cache_create+0x2a/0x1d0
> >scsi_init_sense_cache+0xa0/0xc0
> >scsi_add_host_with_dma+0x67/0x360
> >isci_pci_probe+0x873/0xc90
> >local_pci_probe+0x42/0xa0
> >work_for_cpu_fn+0x14/0x20
> >process_one_work+0x273/0x6b0
> >worker_thread+0x21b/0x3f0
> >kthread+0x147/0x180
> >ret_from_fork+0x2a/0x40
> > 
> > -> #2c (cscsi_sense_cache_mutexc){+.+.}c:
> >__lock_acquire+0x10a5/0x1100
> >lock_acquire+0xea/0x1f0
> >__mutex_lock+0x6c/0x960
> >mutex_lock_nested+0x1b/0x20
> >scsi_init_sense_cache+0x3d/0xc0
> >scsi_add_host_with_dma+0x67/0x360
> >isci_pci_probe+0x873/0xc90
> >local_pci_probe+0x42/0xa0
> >work_for_cpu_fn+0x14/0x20
> >process_one_work+0x273/0x6b0
> >worker_thread+0x21b/0x3f0
> >kthread+0x147/0x180
> >ret_from_fork+0x2a/0x40
> > 
> > -> #1c (c()c){+.+.}c:
> >process_one_work+0x244/0x6b0
> >worker_thread+0x21b/0x3f0
> >kthread+0x147/0x180
> >ret_from_fork+0x2a/0x40
> >0x
> > 
> > -> #0c (c(complete)>donec){+.+.}c:
> >check_prev_add+0x3be/0x700
> >__lock_acquire+0x10a5/0x1100
> >lock_acquire+0xea/0x1f0
> >wait_for_completion+0x3b/0x130
> >flush_work+0x1fd/0x2c0
> >lru_add_drain_all_cpuslocked+0x158/0x1a0
> >lru_add_drain_all+0x13/0x20
> >SyS_mlockall+0xb8/0x1c0
> >entry_SYSCALL_64_fastpath+0x23/0xc2
> > 
> > other info that might help us debug this:
> > 
> > Chain exists of:
> >   c(complete)>donec --> ccpu_hotplug_lock.rw_semc --> clockc#3c
> > 
> >  Possible unsafe locking scenario:
> > 
> >CPU0CPU1
> >
> >   lock(clockc#3c);
> >lock(ccpu_hotplug_lock.rw_semc);
> >lock(clockc#3c);
> >   lock(c(complete)>donec);
> > 
> >  *** DEADLOCK ***
> > 
> > 2 locks held by startpar/582:
> >  #0: c (ccpu_hotplug_lock.rw_semc){}c, at: [] 
> > lru_add_drain_all+0xe/0x20
> >  #1: c (clockc#3c){+.+.}c, at: [] 
> > lru_add_drain_all_cpuslocked+0x46/0x1a0
> > 
> > stack backtrace:
> > dCPU: 23 PID: 582 Comm: startpar Tainted: GW   
> > 4.13.0-rc2-00317-gadc6764a3adf-dirty #797
> > dHardware name: Intel Corporation S2600GZ/S2600GZ, BIOS 
> > SE5C600.86B.02.02.0002.122320131210 12/23/2013
> > dCall Trace:
> > d dump_stack+0x86/0xcf
> > d print_circular_bug+0x203/0x2f0
> > d check_prev_add+0x3be/0x700
> > d ? add_lock_to_list.isra.30+0xc0/0xc0
> > d ? is_bpf_text_address+0x82/0xe0
> > d ? unwind_get_return_address+0x1f/0x30
> > d __lock_acquire+0x10a5/0x1100
> > d ? __lock_acquire+0x10a5/0x1100
> > d ? add_lock_to_list.isra.30+0xc0/0xc0
> > d lock_acquire+0xea/0x1f0
> > d ? flush_work+0x1fd/0x2c0
> > d wait_for_completion+0x3b/0x130
> > d ? flush_work+0x1fd/0x2c0
> > d flush_work+0x1fd/0x2c0
> > d ? flush_workqueue_prep_pwqs+0x1c0/0x1c0
> > d ? trace_hardirqs_on+0xd/0x10

Re: [PATCH v2] arm64: arch_timer: avoid infinite recursion when ftrace is enabled

2017-08-09 Thread Ding Tianhong
add Danial and Thomas.

On 2017/8/10 10:52, Ding Tianhong wrote:
> On platforms with an arch timer erratum workaround, it's possible for
> arch_timer_reg_read_stable() to recurse into itself when certain
> tracing options are enabled, leading to stack overflows and related
> problems.
> 
> For example, when PREEMPT_TRACER and FUNCTION_GRAPH_TRACER are
> selected, it's possible to trigger this with:
> 
> $ mount -t debugfs nodev /sys/kernel/debug/
> $ echo function_graph > /sys/kernel/debug/tracing/current_tracer
> 
> The problem is that in such cases, preempt_disable() instrumentation
> attempts to acquire a timestamp via trace_clock(), resulting in a call
> back to arch_timer_reg_read_stable(), and hence recursion.
> 
> This patch changes arch_timer_reg_read_stable() to use
> preempt_{disable,enable}_notrace(), which avoids this.
> 
> This problem is similar to the fixed by upstream commit 96b3d28bf4
> ("sched/clock: Prevent tracing recursion in sched_clock_cpu()").
> 
> Fixes: 6acc71ccac71 ("arm64: arch_timer: Allows a CPU-specific erratum to 
> only affect a subset of CPUs")
> Signed-off-by: Ding Tianhong 
> Acked-by: Mark Rutland 
> Acked-by: Marc Zyngier 
> ---
>  arch/arm64/include/asm/arch_timer.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/arch_timer.h 
> b/arch/arm64/include/asm/arch_timer.h
> index 74d08e4..67bb7a4 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -65,13 +65,13 @@ struct arch_timer_erratum_workaround {
>   u64 _val;   \
>   if (needs_unstable_timer_counter_workaround()) {\
>   const struct arch_timer_erratum_workaround *wa; \
> - preempt_disable();  \
> + preempt_disable_notrace();  \
>   wa = __this_cpu_read(timer_unstable_counter_workaround); \
>   if (wa && wa->read_##reg)   \
>   _val = wa->read_##reg();\
>   else\
>   _val = read_sysreg(reg);\
> - preempt_enable();   \
> + preempt_enable_notrace();   \
>   } else {\
>   _val = read_sysreg(reg);\
>   }   \
> 



Re: [PATCH v2] arm64: arch_timer: avoid infinite recursion when ftrace is enabled

2017-08-09 Thread Ding Tianhong
add Danial and Thomas.

On 2017/8/10 10:52, Ding Tianhong wrote:
> On platforms with an arch timer erratum workaround, it's possible for
> arch_timer_reg_read_stable() to recurse into itself when certain
> tracing options are enabled, leading to stack overflows and related
> problems.
> 
> For example, when PREEMPT_TRACER and FUNCTION_GRAPH_TRACER are
> selected, it's possible to trigger this with:
> 
> $ mount -t debugfs nodev /sys/kernel/debug/
> $ echo function_graph > /sys/kernel/debug/tracing/current_tracer
> 
> The problem is that in such cases, preempt_disable() instrumentation
> attempts to acquire a timestamp via trace_clock(), resulting in a call
> back to arch_timer_reg_read_stable(), and hence recursion.
> 
> This patch changes arch_timer_reg_read_stable() to use
> preempt_{disable,enable}_notrace(), which avoids this.
> 
> This problem is similar to the fixed by upstream commit 96b3d28bf4
> ("sched/clock: Prevent tracing recursion in sched_clock_cpu()").
> 
> Fixes: 6acc71ccac71 ("arm64: arch_timer: Allows a CPU-specific erratum to 
> only affect a subset of CPUs")
> Signed-off-by: Ding Tianhong 
> Acked-by: Mark Rutland 
> Acked-by: Marc Zyngier 
> ---
>  arch/arm64/include/asm/arch_timer.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/arch_timer.h 
> b/arch/arm64/include/asm/arch_timer.h
> index 74d08e4..67bb7a4 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -65,13 +65,13 @@ struct arch_timer_erratum_workaround {
>   u64 _val;   \
>   if (needs_unstable_timer_counter_workaround()) {\
>   const struct arch_timer_erratum_workaround *wa; \
> - preempt_disable();  \
> + preempt_disable_notrace();  \
>   wa = __this_cpu_read(timer_unstable_counter_workaround); \
>   if (wa && wa->read_##reg)   \
>   _val = wa->read_##reg();\
>   else\
>   _val = read_sysreg(reg);\
> - preempt_enable();   \
> + preempt_enable_notrace();   \
>   } else {\
>   _val = read_sysreg(reg);\
>   }   \
> 



Re: [PATCH] iio: accel: Bugfix to enbale and allow different events to work parallely.

2017-08-09 Thread Harinath Nampally

On Mon, 31 Jul 2017 07:17:38 -0400
Harinath Nampally  wrote:


This driver supports multiple devices like mma8653, mma8652, mma8452, mma8453 
and
fxls8471. Almost all these devices have more than one event. Current driver 
design
hardcodes the event specific information, so only one event can be supported by 
this
driver and current design doesn't have the flexibility to add more events.

This patch fixes by detaching the event related information from chip_info 
struct,
and based on channel type and event direction the corresponding event 
configuration registers
are picked dynamically. Hence multiple events can be handled in read/write 
callbacks.

Changes are thoroughly tested on fxls8471 device on imx6UL Eval board using 
iio_event_monitor user space program.

After this fix both Freefall and Transient events are handled by the driver 
without any conflicts.

Signed-off-by: Harinath Nampally 

Hi,

A few minor bits and bobs inline.

Jonathan

Thank you for the review.


+ /**
+  * struct mma8452_event_regs - chip specific data related to events
+  * @ev_cfg:   event config register address
+  * @ev_cfg_ele:   latch bit in event config register
+  * @ev_cfg_chan_shift:number of the bit to enable events in X
+  *direction; in event config register
+  * @ev_src:   event source register address
+  * @ev_src_xe:bit in event source register that 
indicates
+  *an event in X direction
+  * @ev_src_ye:bit in event source register that 
indicates
+  *an event in Y direction
+  * @ev_src_ze:bit in event source register that 
indicates
+  *an event in Z direction
+  * @ev_ths:   event threshold register address
+  * @ev_ths_mask:  mask for the threshold value
+  * @ev_count: event count (period) register address
+  *
+  * Since not all chips supported by the driver support comparing high pass
+  * filtered data for events (interrupts), different interrupt sources are
+  * used for different chips and the relevant registers are included here.
+  */
+struct mma8452_event_regs {
+   u8 ev_cfg;
+   u8 ev_cfg_ele;
+   u8 ev_cfg_chan_shift;
As far as I can tell the above isn't used...
Please sanity check the others

Yes they are not used and not really necessary I think, probably I 
should remove them!
It makes sense to only have ev_cfg, ev_src, ev_ths, ev_ths_mask and 
ev_count.
as they are common to other events as well like orientation, 
single/double tap etc.

So in future this same struct can be reused across different events.

  enum {
@@ -394,11 +403,11 @@ static ssize_t mma8452_show_os_ratio_avail(struct device 
*dev,
  }
  
  static IIO_DEV_ATTR_SAMP_FREQ_AVAIL(mma8452_show_samp_freq_avail);

-static IIO_DEVICE_ATTR(in_accel_scale_available, S_IRUGO,
+static IIO_DEVICE_ATTR(in_accel_scale_available, 0444,
   mma8452_show_scale_avail, NULL, 0);
  static IIO_DEVICE_ATTR(in_accel_filter_high_pass_3db_frequency_available,
-  S_IRUGO, mma8452_show_hp_cutoff_avail, NULL, 0);
-static IIO_DEVICE_ATTR(in_accel_oversampling_ratio_available, S_IRUGO,
+  0444, mma8452_show_hp_cutoff_avail, NULL, 0);
+static IIO_DEVICE_ATTR(in_accel_oversampling_ratio_available, 0444,
   mma8452_show_os_ratio_avail, NULL, 0);
Separate change.  Please do it in a precursor patch rather than
adding noise to this one..

Sure I will.

case IIO_EV_INFO_PERIOD:
ret = i2c_smbus_read_byte_data(data->client,
-  data->chip_info->ev_count);
+   
ev_regs.ev_count);
This indenting looks somewhat odd..

Yes I agree, will fix it.

+   switch (chan->type) {
+   case IIO_ACCEL:
+   switch (dir) {
+   case IIO_EV_DIR_FALLING:
+   return mma8452_freefall_mode_enabled(data);
+   case IIO_EV_DIR_RISING:
+   ret = i2c_smbus_read_byte_data(data->client,
+   
MMA8452_TRANSIENT_CFG);
Again, some crazy stuff going on with indenting..

+   if (ret < 0)
+   return ret;
  
-		ret = i2c_smbus_read_byte_data(data->client,

-  data->chip_info->ev_cfg);
-   if (ret < 0)
-   return ret;
+   return ret & 
MMA8452_TRANSIENT_CFG_CHAN(chan->scan_index) ? 1 : 0;

It's a nasty trick in a way, but commonly used in the kernel.
return !!(ret & MMA8452_TRANSIENT_CFG_CHAN(chan->scan_index));

  
-		

Re: [PATCH] iio: accel: Bugfix to enbale and allow different events to work parallely.

2017-08-09 Thread Harinath Nampally

On Mon, 31 Jul 2017 07:17:38 -0400
Harinath Nampally  wrote:


This driver supports multiple devices like mma8653, mma8652, mma8452, mma8453 
and
fxls8471. Almost all these devices have more than one event. Current driver 
design
hardcodes the event specific information, so only one event can be supported by 
this
driver and current design doesn't have the flexibility to add more events.

This patch fixes by detaching the event related information from chip_info 
struct,
and based on channel type and event direction the corresponding event 
configuration registers
are picked dynamically. Hence multiple events can be handled in read/write 
callbacks.

Changes are thoroughly tested on fxls8471 device on imx6UL Eval board using 
iio_event_monitor user space program.

After this fix both Freefall and Transient events are handled by the driver 
without any conflicts.

Signed-off-by: Harinath Nampally 

Hi,

A few minor bits and bobs inline.

Jonathan

Thank you for the review.


+ /**
+  * struct mma8452_event_regs - chip specific data related to events
+  * @ev_cfg:   event config register address
+  * @ev_cfg_ele:   latch bit in event config register
+  * @ev_cfg_chan_shift:number of the bit to enable events in X
+  *direction; in event config register
+  * @ev_src:   event source register address
+  * @ev_src_xe:bit in event source register that 
indicates
+  *an event in X direction
+  * @ev_src_ye:bit in event source register that 
indicates
+  *an event in Y direction
+  * @ev_src_ze:bit in event source register that 
indicates
+  *an event in Z direction
+  * @ev_ths:   event threshold register address
+  * @ev_ths_mask:  mask for the threshold value
+  * @ev_count: event count (period) register address
+  *
+  * Since not all chips supported by the driver support comparing high pass
+  * filtered data for events (interrupts), different interrupt sources are
+  * used for different chips and the relevant registers are included here.
+  */
+struct mma8452_event_regs {
+   u8 ev_cfg;
+   u8 ev_cfg_ele;
+   u8 ev_cfg_chan_shift;
As far as I can tell the above isn't used...
Please sanity check the others

Yes they are not used and not really necessary I think, probably I 
should remove them!
It makes sense to only have ev_cfg, ev_src, ev_ths, ev_ths_mask and 
ev_count.
as they are common to other events as well like orientation, 
single/double tap etc.

So in future this same struct can be reused across different events.

  enum {
@@ -394,11 +403,11 @@ static ssize_t mma8452_show_os_ratio_avail(struct device 
*dev,
  }
  
  static IIO_DEV_ATTR_SAMP_FREQ_AVAIL(mma8452_show_samp_freq_avail);

-static IIO_DEVICE_ATTR(in_accel_scale_available, S_IRUGO,
+static IIO_DEVICE_ATTR(in_accel_scale_available, 0444,
   mma8452_show_scale_avail, NULL, 0);
  static IIO_DEVICE_ATTR(in_accel_filter_high_pass_3db_frequency_available,
-  S_IRUGO, mma8452_show_hp_cutoff_avail, NULL, 0);
-static IIO_DEVICE_ATTR(in_accel_oversampling_ratio_available, S_IRUGO,
+  0444, mma8452_show_hp_cutoff_avail, NULL, 0);
+static IIO_DEVICE_ATTR(in_accel_oversampling_ratio_available, 0444,
   mma8452_show_os_ratio_avail, NULL, 0);
Separate change.  Please do it in a precursor patch rather than
adding noise to this one..

Sure I will.

case IIO_EV_INFO_PERIOD:
ret = i2c_smbus_read_byte_data(data->client,
-  data->chip_info->ev_count);
+   
ev_regs.ev_count);
This indenting looks somewhat odd..

Yes I agree, will fix it.

+   switch (chan->type) {
+   case IIO_ACCEL:
+   switch (dir) {
+   case IIO_EV_DIR_FALLING:
+   return mma8452_freefall_mode_enabled(data);
+   case IIO_EV_DIR_RISING:
+   ret = i2c_smbus_read_byte_data(data->client,
+   
MMA8452_TRANSIENT_CFG);
Again, some crazy stuff going on with indenting..

+   if (ret < 0)
+   return ret;
  
-		ret = i2c_smbus_read_byte_data(data->client,

-  data->chip_info->ev_cfg);
-   if (ret < 0)
-   return ret;
+   return ret & 
MMA8452_TRANSIENT_CFG_CHAN(chan->scan_index) ? 1 : 0;

It's a nasty trick in a way, but commonly used in the kernel.
return !!(ret & MMA8452_TRANSIENT_CFG_CHAN(chan->scan_index));

  
-		return !!(ret & BIT(chan->scan_index +

- 

[PATCH V11 2/3] powernv: Add support to set power-shifting-ratio

2017-08-09 Thread Shilpasri G Bhat
This patch adds support to set power-shifting-ratio which hints the
firmware how to distribute/throttle power between different entities
in a system (e.g CPU v/s GPU). This ratio is used by OCC for power
capping algorithm.

Signed-off-by: Shilpasri G Bhat 
---
 Documentation/ABI/testing/sysfs-firmware-opal-psr |  18 +++
 arch/powerpc/include/asm/opal-api.h   |   2 +
 arch/powerpc/include/asm/opal.h   |   3 +
 arch/powerpc/platforms/powernv/Makefile   |   2 +-
 arch/powerpc/platforms/powernv/opal-psr.c | 175 ++
 arch/powerpc/platforms/powernv/opal-wrappers.S|   2 +
 arch/powerpc/platforms/powernv/opal.c |   3 +
 7 files changed, 204 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/ABI/testing/sysfs-firmware-opal-psr
 create mode 100644 arch/powerpc/platforms/powernv/opal-psr.c

diff --git a/Documentation/ABI/testing/sysfs-firmware-opal-psr 
b/Documentation/ABI/testing/sysfs-firmware-opal-psr
new file mode 100644
index 000..cc2ece7
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-firmware-opal-psr
@@ -0,0 +1,18 @@
+What:  /sys/firmware/opal/psr
+Date:  August 2017
+Contact:   Linux for PowerPC mailing list 
+Description:   Power-Shift-Ratio directory for Powernv P9 servers
+
+   Power-Shift-Ratio allows to provide hints the firmware
+   to shift/throttle power between different entities in
+   the system. Each attribute in this directory indicates
+   a settable PSR.
+
+What:  /sys/firmware/opal/psr/cpu_to_gpu_X
+Date:  August 2017
+Contact:   Linux for PowerPC mailing list 
+Description:   PSR sysfs attributes for Powernv P9 servers
+
+   Power-Shift-Ratio between CPU and GPU for a given chip
+   with chip-id X. This file gives the ratio (0-100)
+   which is used by OCC for power-capping.
diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index b87305b..0cb7d11 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -196,6 +196,8 @@
 #define OPAL_IMC_COUNTERS_STOP 151
 #define OPAL_GET_POWERCAP  152
 #define OPAL_SET_POWERCAP  153
+#define OPAL_GET_POWER_SHIFT_RATIO 154
+#define OPAL_SET_POWER_SHIFT_RATIO 155
 #define OPAL_PCI_SET_P2P   157
 #define OPAL_LAST  157
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 6f09ab7..d87ffcb 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -277,6 +277,8 @@ int64_t opal_imc_counters_init(uint32_t type, uint64_t 
address,
 
 int opal_get_powercap(u32 handle, int token, u32 *pcap);
 int opal_set_powercap(u32 handle, int token, u32 pcap);
+int opal_get_power_shift_ratio(u32 handle, int token, u32 *psr);
+int opal_set_power_shift_ratio(u32 handle, int token, u32 psr);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
@@ -356,6 +358,7 @@ static inline int opal_get_async_rc(struct opal_msg msg)
 void opal_wake_poller(void);
 
 void opal_powercap_init(void);
+void opal_psr_init(void);
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index f9ec36d..674ed1e 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -2,7 +2,7 @@ obj-y   += setup.o opal-wrappers.o opal.o 
opal-async.o idle.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
 obj-y  += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
-obj-y  += opal-kmsg.o opal-powercap.o
+obj-y  += opal-kmsg.o opal-powercap.o opal-psr.o
 
 obj-$(CONFIG_SMP)  += smp.o subcore.o subcore-asm.o
 obj-$(CONFIG_PCI)  += pci.o pci-ioda.o npu-dma.o
diff --git a/arch/powerpc/platforms/powernv/opal-psr.c 
b/arch/powerpc/platforms/powernv/opal-psr.c
new file mode 100644
index 000..7313b7f
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-psr.c
@@ -0,0 +1,175 @@
+/*
+ * PowerNV OPAL Power-Shift-Ratio interface
+ *
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt) "opal-psr: " fmt
+
+#include 
+#include 
+#include 
+
+#include 
+
+DEFINE_MUTEX(psr_mutex);
+
+static struct kobject *psr_kobj;
+
+struct psr_attr {
+   u32 handle;
+   

[PATCH V11 1/3] powernv: powercap: Add support for powercap framework

2017-08-09 Thread Shilpasri G Bhat
Adds a generic powercap framework to change the system powercap
inband through OPAL-OCC command/response interface.

Signed-off-by: Shilpasri G Bhat 
---
 .../ABI/testing/sysfs-firmware-opal-powercap   |  31 +++
 arch/powerpc/include/asm/opal-api.h|   3 +
 arch/powerpc/include/asm/opal.h|   5 +
 arch/powerpc/platforms/powernv/Makefile|   2 +-
 arch/powerpc/platforms/powernv/opal-powercap.c | 244 +
 arch/powerpc/platforms/powernv/opal-wrappers.S |   2 +
 arch/powerpc/platforms/powernv/opal.c  |   4 +
 7 files changed, 290 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/ABI/testing/sysfs-firmware-opal-powercap
 create mode 100644 arch/powerpc/platforms/powernv/opal-powercap.c

diff --git a/Documentation/ABI/testing/sysfs-firmware-opal-powercap 
b/Documentation/ABI/testing/sysfs-firmware-opal-powercap
new file mode 100644
index 000..c9b66ec
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-firmware-opal-powercap
@@ -0,0 +1,31 @@
+What:  /sys/firmware/opal/powercap
+Date:  August 2017
+Contact:   Linux for PowerPC mailing list 
+Description:   Powercap directory for Powernv (P8, P9) servers
+
+   Each folder in this directory contains a
+   power-cappable component.
+
+What:  /sys/firmware/opal/powercap/system-powercap
+   /sys/firmware/opal/powercap/system-powercap/powercap-min
+   /sys/firmware/opal/powercap/system-powercap/powercap-max
+   /sys/firmware/opal/powercap/system-powercap/powercap-current
+Date:  August 2017
+Contact:   Linux for PowerPC mailing list 
+Description:   System powercap directory and attributes applicable for
+   Powernv (P8, P9) servers
+
+   This directory provides powercap information. It
+   contains below sysfs attributes:
+
+   - powercap-min : This file provides the minimum
+ possible powercap in Watt units
+
+   - powercap-max : This file provides the maximum
+ possible powercap in Watt units
+
+   - powercap-current : This file provides the current
+ powercap set on the system. Writing to this file
+ creates a request for setting a new-powercap. The
+ powercap requested must be between powercap-min
+ and powercap-max.
diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index ced1ef2..b87305b 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -42,6 +42,7 @@
 #define OPAL_I2C_STOP_ERR  -24
 #define OPAL_XIVE_PROVISIONING -31
 #define OPAL_XIVE_FREE_ACTIVE  -32
+#define OPAL_TIMEOUT   -33
 
 /* API Tokens (in r0) */
 #define OPAL_INVALID_CALL -1
@@ -193,6 +194,8 @@
 #define OPAL_IMC_COUNTERS_INIT 149
 #define OPAL_IMC_COUNTERS_START150
 #define OPAL_IMC_COUNTERS_STOP 151
+#define OPAL_GET_POWERCAP  152
+#define OPAL_SET_POWERCAP  153
 #define OPAL_PCI_SET_P2P   157
 #define OPAL_LAST  157
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 5a715e6..6f09ab7 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -275,6 +275,9 @@ int64_t opal_imc_counters_init(uint32_t type, uint64_t 
address,
 int64_t opal_imc_counters_start(uint32_t type, uint64_t cpu_pir);
 int64_t opal_imc_counters_stop(uint32_t type, uint64_t cpu_pir);
 
+int opal_get_powercap(u32 handle, int token, u32 *pcap);
+int opal_set_powercap(u32 handle, int token, u32 pcap);
+
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
   int depth, void *data);
@@ -352,6 +355,8 @@ static inline int opal_get_async_rc(struct opal_msg msg)
 
 void opal_wake_poller(void);
 
+void opal_powercap_init(void);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_OPAL_H */
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index a0d4353..f9ec36d 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -2,7 +2,7 @@ obj-y   += setup.o opal-wrappers.o opal.o 
opal-async.o idle.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
 obj-y  += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
-obj-y  += opal-kmsg.o
+obj-y  += opal-kmsg.o opal-powercap.o
 
 obj-$(CONFIG_SMP)  += smp.o subcore.o subcore-asm.o
 

[PATCH V11 2/3] powernv: Add support to set power-shifting-ratio

2017-08-09 Thread Shilpasri G Bhat
This patch adds support to set power-shifting-ratio which hints the
firmware how to distribute/throttle power between different entities
in a system (e.g CPU v/s GPU). This ratio is used by OCC for power
capping algorithm.

Signed-off-by: Shilpasri G Bhat 
---
 Documentation/ABI/testing/sysfs-firmware-opal-psr |  18 +++
 arch/powerpc/include/asm/opal-api.h   |   2 +
 arch/powerpc/include/asm/opal.h   |   3 +
 arch/powerpc/platforms/powernv/Makefile   |   2 +-
 arch/powerpc/platforms/powernv/opal-psr.c | 175 ++
 arch/powerpc/platforms/powernv/opal-wrappers.S|   2 +
 arch/powerpc/platforms/powernv/opal.c |   3 +
 7 files changed, 204 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/ABI/testing/sysfs-firmware-opal-psr
 create mode 100644 arch/powerpc/platforms/powernv/opal-psr.c

diff --git a/Documentation/ABI/testing/sysfs-firmware-opal-psr 
b/Documentation/ABI/testing/sysfs-firmware-opal-psr
new file mode 100644
index 000..cc2ece7
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-firmware-opal-psr
@@ -0,0 +1,18 @@
+What:  /sys/firmware/opal/psr
+Date:  August 2017
+Contact:   Linux for PowerPC mailing list 
+Description:   Power-Shift-Ratio directory for Powernv P9 servers
+
+   Power-Shift-Ratio allows to provide hints the firmware
+   to shift/throttle power between different entities in
+   the system. Each attribute in this directory indicates
+   a settable PSR.
+
+What:  /sys/firmware/opal/psr/cpu_to_gpu_X
+Date:  August 2017
+Contact:   Linux for PowerPC mailing list 
+Description:   PSR sysfs attributes for Powernv P9 servers
+
+   Power-Shift-Ratio between CPU and GPU for a given chip
+   with chip-id X. This file gives the ratio (0-100)
+   which is used by OCC for power-capping.
diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index b87305b..0cb7d11 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -196,6 +196,8 @@
 #define OPAL_IMC_COUNTERS_STOP 151
 #define OPAL_GET_POWERCAP  152
 #define OPAL_SET_POWERCAP  153
+#define OPAL_GET_POWER_SHIFT_RATIO 154
+#define OPAL_SET_POWER_SHIFT_RATIO 155
 #define OPAL_PCI_SET_P2P   157
 #define OPAL_LAST  157
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 6f09ab7..d87ffcb 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -277,6 +277,8 @@ int64_t opal_imc_counters_init(uint32_t type, uint64_t 
address,
 
 int opal_get_powercap(u32 handle, int token, u32 *pcap);
 int opal_set_powercap(u32 handle, int token, u32 pcap);
+int opal_get_power_shift_ratio(u32 handle, int token, u32 *psr);
+int opal_set_power_shift_ratio(u32 handle, int token, u32 psr);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
@@ -356,6 +358,7 @@ static inline int opal_get_async_rc(struct opal_msg msg)
 void opal_wake_poller(void);
 
 void opal_powercap_init(void);
+void opal_psr_init(void);
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index f9ec36d..674ed1e 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -2,7 +2,7 @@ obj-y   += setup.o opal-wrappers.o opal.o 
opal-async.o idle.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
 obj-y  += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
-obj-y  += opal-kmsg.o opal-powercap.o
+obj-y  += opal-kmsg.o opal-powercap.o opal-psr.o
 
 obj-$(CONFIG_SMP)  += smp.o subcore.o subcore-asm.o
 obj-$(CONFIG_PCI)  += pci.o pci-ioda.o npu-dma.o
diff --git a/arch/powerpc/platforms/powernv/opal-psr.c 
b/arch/powerpc/platforms/powernv/opal-psr.c
new file mode 100644
index 000..7313b7f
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-psr.c
@@ -0,0 +1,175 @@
+/*
+ * PowerNV OPAL Power-Shift-Ratio interface
+ *
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt) "opal-psr: " fmt
+
+#include 
+#include 
+#include 
+
+#include 
+
+DEFINE_MUTEX(psr_mutex);
+
+static struct kobject *psr_kobj;
+
+struct psr_attr {
+   u32 handle;
+   struct kobj_attribute attr;
+} *psr_attrs;
+
+static ssize_t psr_show(struct 

[PATCH V11 1/3] powernv: powercap: Add support for powercap framework

2017-08-09 Thread Shilpasri G Bhat
Adds a generic powercap framework to change the system powercap
inband through OPAL-OCC command/response interface.

Signed-off-by: Shilpasri G Bhat 
---
 .../ABI/testing/sysfs-firmware-opal-powercap   |  31 +++
 arch/powerpc/include/asm/opal-api.h|   3 +
 arch/powerpc/include/asm/opal.h|   5 +
 arch/powerpc/platforms/powernv/Makefile|   2 +-
 arch/powerpc/platforms/powernv/opal-powercap.c | 244 +
 arch/powerpc/platforms/powernv/opal-wrappers.S |   2 +
 arch/powerpc/platforms/powernv/opal.c  |   4 +
 7 files changed, 290 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/ABI/testing/sysfs-firmware-opal-powercap
 create mode 100644 arch/powerpc/platforms/powernv/opal-powercap.c

diff --git a/Documentation/ABI/testing/sysfs-firmware-opal-powercap 
b/Documentation/ABI/testing/sysfs-firmware-opal-powercap
new file mode 100644
index 000..c9b66ec
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-firmware-opal-powercap
@@ -0,0 +1,31 @@
+What:  /sys/firmware/opal/powercap
+Date:  August 2017
+Contact:   Linux for PowerPC mailing list 
+Description:   Powercap directory for Powernv (P8, P9) servers
+
+   Each folder in this directory contains a
+   power-cappable component.
+
+What:  /sys/firmware/opal/powercap/system-powercap
+   /sys/firmware/opal/powercap/system-powercap/powercap-min
+   /sys/firmware/opal/powercap/system-powercap/powercap-max
+   /sys/firmware/opal/powercap/system-powercap/powercap-current
+Date:  August 2017
+Contact:   Linux for PowerPC mailing list 
+Description:   System powercap directory and attributes applicable for
+   Powernv (P8, P9) servers
+
+   This directory provides powercap information. It
+   contains below sysfs attributes:
+
+   - powercap-min : This file provides the minimum
+ possible powercap in Watt units
+
+   - powercap-max : This file provides the maximum
+ possible powercap in Watt units
+
+   - powercap-current : This file provides the current
+ powercap set on the system. Writing to this file
+ creates a request for setting a new-powercap. The
+ powercap requested must be between powercap-min
+ and powercap-max.
diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index ced1ef2..b87305b 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -42,6 +42,7 @@
 #define OPAL_I2C_STOP_ERR  -24
 #define OPAL_XIVE_PROVISIONING -31
 #define OPAL_XIVE_FREE_ACTIVE  -32
+#define OPAL_TIMEOUT   -33
 
 /* API Tokens (in r0) */
 #define OPAL_INVALID_CALL -1
@@ -193,6 +194,8 @@
 #define OPAL_IMC_COUNTERS_INIT 149
 #define OPAL_IMC_COUNTERS_START150
 #define OPAL_IMC_COUNTERS_STOP 151
+#define OPAL_GET_POWERCAP  152
+#define OPAL_SET_POWERCAP  153
 #define OPAL_PCI_SET_P2P   157
 #define OPAL_LAST  157
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 5a715e6..6f09ab7 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -275,6 +275,9 @@ int64_t opal_imc_counters_init(uint32_t type, uint64_t 
address,
 int64_t opal_imc_counters_start(uint32_t type, uint64_t cpu_pir);
 int64_t opal_imc_counters_stop(uint32_t type, uint64_t cpu_pir);
 
+int opal_get_powercap(u32 handle, int token, u32 *pcap);
+int opal_set_powercap(u32 handle, int token, u32 pcap);
+
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
   int depth, void *data);
@@ -352,6 +355,8 @@ static inline int opal_get_async_rc(struct opal_msg msg)
 
 void opal_wake_poller(void);
 
+void opal_powercap_init(void);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_OPAL_H */
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index a0d4353..f9ec36d 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -2,7 +2,7 @@ obj-y   += setup.o opal-wrappers.o opal.o 
opal-async.o idle.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
 obj-y  += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
-obj-y  += opal-kmsg.o
+obj-y  += opal-kmsg.o opal-powercap.o
 
 obj-$(CONFIG_SMP)  += smp.o subcore.o subcore-asm.o
 obj-$(CONFIG_PCI)  += pci.o pci-ioda.o npu-dma.o
diff --git 

[PATCH V11 0/3] powernv : Add support for OPAL-OCC command/response interface

2017-08-09 Thread Shilpasri G Bhat
In P9, OCC (On-Chip-Controller) supports shared memory based
commad-response interface. Within the shared memory there is an OPAL
command buffer and OCC response buffer that can be used to send
inband commands to OCC. The following commands are supported:

1) Set system powercap
2) Set CPU-GPU power shifting ratio
3) Clear min/max for OCC sensor groups

Changes from V10:
- Rebased on powerpc-next
- Add sysfs interface instead of IOCTL
  (Skiboot patch for Patch3 is posted below:
  https://lists.ozlabs.org/pipermail/skiboot/2017-August/008553.html )

Shilpasri G Bhat (3):
  powernv: powercap: Add support for powercap framework
  powernv: Add support to set power-shifting-ratio
  powernv: Add support to clear sensor groups data

 .../ABI/testing/sysfs-firmware-opal-powercap   |  31 +++
 Documentation/ABI/testing/sysfs-firmware-opal-psr  |  18 ++
 .../bindings/powerpc/opal/sensor-groups.txt|  27 +++
 arch/powerpc/include/asm/opal-api.h|   6 +
 arch/powerpc/include/asm/opal.h|  10 +
 arch/powerpc/platforms/powernv/Makefile|   2 +-
 arch/powerpc/platforms/powernv/opal-powercap.c | 244 +
 arch/powerpc/platforms/powernv/opal-psr.c  | 175 +++
 .../powerpc/platforms/powernv/opal-sensor-groups.c | 212 ++
 arch/powerpc/platforms/powernv/opal-wrappers.S |   5 +
 arch/powerpc/platforms/powernv/opal.c  |  10 +
 11 files changed, 739 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/ABI/testing/sysfs-firmware-opal-powercap
 create mode 100644 Documentation/ABI/testing/sysfs-firmware-opal-psr
 create mode 100644 
Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt
 create mode 100644 arch/powerpc/platforms/powernv/opal-powercap.c
 create mode 100644 arch/powerpc/platforms/powernv/opal-psr.c
 create mode 100644 arch/powerpc/platforms/powernv/opal-sensor-groups.c

-- 
1.8.3.1



[PATCH V11 3/3] powernv: Add support to clear sensor groups data

2017-08-09 Thread Shilpasri G Bhat
Adds support for clearing different sensor groups. OCC inband sensor
groups like CSM, Profiler, Job Scheduler can be cleared using this
driver. The min/max of all sensors belonging to these sensor groups
will be cleared.

Signed-off-by: Shilpasri G Bhat 
---
 .../bindings/powerpc/opal/sensor-groups.txt|  27 +++
 arch/powerpc/include/asm/opal-api.h|   1 +
 arch/powerpc/include/asm/opal.h|   2 +
 arch/powerpc/platforms/powernv/Makefile|   2 +-
 .../powerpc/platforms/powernv/opal-sensor-groups.c | 212 +
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 arch/powerpc/platforms/powernv/opal.c  |   3 +
 7 files changed, 247 insertions(+), 1 deletion(-)
 create mode 100644 
Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt
 create mode 100644 arch/powerpc/platforms/powernv/opal-sensor-groups.c

diff --git a/Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt 
b/Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt
new file mode 100644
index 000..6ad881c
--- /dev/null
+++ b/Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt
@@ -0,0 +1,27 @@
+IBM OPAL Sensor Groups Binding
+---
+
+Node: /ibm,opal/sensor-groups
+
+Description: Contains sensor groups available in the Powernv P9
+servers. Each child node indicates a sensor group.
+
+- compatible : Should be "ibm,opal-sensor-group"
+
+Each child node contains below properties:
+
+- type : String to indicate the type of sensor-group
+
+- sensor-group-id: Abstract unique identifier provided by firmware of
+  type  which is used for sensor-group
+  operations like clearing the min/max history of all
+  sensors belonging to the group.
+
+- ibm,chip-id : Chip ID
+
+- sensors : Phandle array of child nodes of /ibm,opal/sensor/
+   belonging to this group
+
+- ops : Array of opal-call numbers indicating available operations on
+   sensor groups like clearing min/max, enabling/disabling sensor
+   group.
diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 0cb7d11..450a60b 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -198,6 +198,7 @@
 #define OPAL_SET_POWERCAP  153
 #define OPAL_GET_POWER_SHIFT_RATIO 154
 #define OPAL_SET_POWER_SHIFT_RATIO 155
+#define OPAL_SENSOR_GROUP_CLEAR156
 #define OPAL_PCI_SET_P2P   157
 #define OPAL_LAST  157
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index d87ffcb..97ff192 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -279,6 +279,7 @@ int64_t opal_imc_counters_init(uint32_t type, uint64_t 
address,
 int opal_set_powercap(u32 handle, int token, u32 pcap);
 int opal_get_power_shift_ratio(u32 handle, int token, u32 *psr);
 int opal_set_power_shift_ratio(u32 handle, int token, u32 psr);
+int opal_sensor_group_clear(u32 group_hndl, int token);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
@@ -359,6 +360,7 @@ static inline int opal_get_async_rc(struct opal_msg msg)
 
 void opal_powercap_init(void);
 void opal_psr_init(void);
+void opal_sensor_groups_init(void);
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 674ed1e..177b3d4 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -2,7 +2,7 @@ obj-y   += setup.o opal-wrappers.o opal.o 
opal-async.o idle.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
 obj-y  += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
-obj-y  += opal-kmsg.o opal-powercap.o opal-psr.o
+obj-y  += opal-kmsg.o opal-powercap.o opal-psr.o 
opal-sensor-groups.o
 
 obj-$(CONFIG_SMP)  += smp.o subcore.o subcore-asm.o
 obj-$(CONFIG_PCI)  += pci.o pci-ioda.o npu-dma.o
diff --git a/arch/powerpc/platforms/powernv/opal-sensor-groups.c 
b/arch/powerpc/platforms/powernv/opal-sensor-groups.c
new file mode 100644
index 000..7e5a235
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-sensor-groups.c
@@ -0,0 +1,212 @@
+/*
+ * PowerNV OPAL Sensor-groups interface
+ *
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt) "opal-sensor-groups: " fmt
+
+#include 

[PATCH V11 0/3] powernv : Add support for OPAL-OCC command/response interface

2017-08-09 Thread Shilpasri G Bhat
In P9, OCC (On-Chip-Controller) supports shared memory based
commad-response interface. Within the shared memory there is an OPAL
command buffer and OCC response buffer that can be used to send
inband commands to OCC. The following commands are supported:

1) Set system powercap
2) Set CPU-GPU power shifting ratio
3) Clear min/max for OCC sensor groups

Changes from V10:
- Rebased on powerpc-next
- Add sysfs interface instead of IOCTL
  (Skiboot patch for Patch3 is posted below:
  https://lists.ozlabs.org/pipermail/skiboot/2017-August/008553.html )

Shilpasri G Bhat (3):
  powernv: powercap: Add support for powercap framework
  powernv: Add support to set power-shifting-ratio
  powernv: Add support to clear sensor groups data

 .../ABI/testing/sysfs-firmware-opal-powercap   |  31 +++
 Documentation/ABI/testing/sysfs-firmware-opal-psr  |  18 ++
 .../bindings/powerpc/opal/sensor-groups.txt|  27 +++
 arch/powerpc/include/asm/opal-api.h|   6 +
 arch/powerpc/include/asm/opal.h|  10 +
 arch/powerpc/platforms/powernv/Makefile|   2 +-
 arch/powerpc/platforms/powernv/opal-powercap.c | 244 +
 arch/powerpc/platforms/powernv/opal-psr.c  | 175 +++
 .../powerpc/platforms/powernv/opal-sensor-groups.c | 212 ++
 arch/powerpc/platforms/powernv/opal-wrappers.S |   5 +
 arch/powerpc/platforms/powernv/opal.c  |  10 +
 11 files changed, 739 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/ABI/testing/sysfs-firmware-opal-powercap
 create mode 100644 Documentation/ABI/testing/sysfs-firmware-opal-psr
 create mode 100644 
Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt
 create mode 100644 arch/powerpc/platforms/powernv/opal-powercap.c
 create mode 100644 arch/powerpc/platforms/powernv/opal-psr.c
 create mode 100644 arch/powerpc/platforms/powernv/opal-sensor-groups.c

-- 
1.8.3.1



[PATCH V11 3/3] powernv: Add support to clear sensor groups data

2017-08-09 Thread Shilpasri G Bhat
Adds support for clearing different sensor groups. OCC inband sensor
groups like CSM, Profiler, Job Scheduler can be cleared using this
driver. The min/max of all sensors belonging to these sensor groups
will be cleared.

Signed-off-by: Shilpasri G Bhat 
---
 .../bindings/powerpc/opal/sensor-groups.txt|  27 +++
 arch/powerpc/include/asm/opal-api.h|   1 +
 arch/powerpc/include/asm/opal.h|   2 +
 arch/powerpc/platforms/powernv/Makefile|   2 +-
 .../powerpc/platforms/powernv/opal-sensor-groups.c | 212 +
 arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
 arch/powerpc/platforms/powernv/opal.c  |   3 +
 7 files changed, 247 insertions(+), 1 deletion(-)
 create mode 100644 
Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt
 create mode 100644 arch/powerpc/platforms/powernv/opal-sensor-groups.c

diff --git a/Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt 
b/Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt
new file mode 100644
index 000..6ad881c
--- /dev/null
+++ b/Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt
@@ -0,0 +1,27 @@
+IBM OPAL Sensor Groups Binding
+---
+
+Node: /ibm,opal/sensor-groups
+
+Description: Contains sensor groups available in the Powernv P9
+servers. Each child node indicates a sensor group.
+
+- compatible : Should be "ibm,opal-sensor-group"
+
+Each child node contains below properties:
+
+- type : String to indicate the type of sensor-group
+
+- sensor-group-id: Abstract unique identifier provided by firmware of
+  type  which is used for sensor-group
+  operations like clearing the min/max history of all
+  sensors belonging to the group.
+
+- ibm,chip-id : Chip ID
+
+- sensors : Phandle array of child nodes of /ibm,opal/sensor/
+   belonging to this group
+
+- ops : Array of opal-call numbers indicating available operations on
+   sensor groups like clearing min/max, enabling/disabling sensor
+   group.
diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 0cb7d11..450a60b 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -198,6 +198,7 @@
 #define OPAL_SET_POWERCAP  153
 #define OPAL_GET_POWER_SHIFT_RATIO 154
 #define OPAL_SET_POWER_SHIFT_RATIO 155
+#define OPAL_SENSOR_GROUP_CLEAR156
 #define OPAL_PCI_SET_P2P   157
 #define OPAL_LAST  157
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index d87ffcb..97ff192 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -279,6 +279,7 @@ int64_t opal_imc_counters_init(uint32_t type, uint64_t 
address,
 int opal_set_powercap(u32 handle, int token, u32 pcap);
 int opal_get_power_shift_ratio(u32 handle, int token, u32 *psr);
 int opal_set_power_shift_ratio(u32 handle, int token, u32 psr);
+int opal_sensor_group_clear(u32 group_hndl, int token);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
@@ -359,6 +360,7 @@ static inline int opal_get_async_rc(struct opal_msg msg)
 
 void opal_powercap_init(void);
 void opal_psr_init(void);
+void opal_sensor_groups_init(void);
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index 674ed1e..177b3d4 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -2,7 +2,7 @@ obj-y   += setup.o opal-wrappers.o opal.o 
opal-async.o idle.o
 obj-y  += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y  += rng.o opal-elog.o opal-dump.o opal-sysparam.o 
opal-sensor.o
 obj-y  += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
-obj-y  += opal-kmsg.o opal-powercap.o opal-psr.o
+obj-y  += opal-kmsg.o opal-powercap.o opal-psr.o 
opal-sensor-groups.o
 
 obj-$(CONFIG_SMP)  += smp.o subcore.o subcore-asm.o
 obj-$(CONFIG_PCI)  += pci.o pci-ioda.o npu-dma.o
diff --git a/arch/powerpc/platforms/powernv/opal-sensor-groups.c 
b/arch/powerpc/platforms/powernv/opal-sensor-groups.c
new file mode 100644
index 000..7e5a235
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-sensor-groups.c
@@ -0,0 +1,212 @@
+/*
+ * PowerNV OPAL Sensor-groups interface
+ *
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt) "opal-sensor-groups: " fmt
+
+#include 
+#include 
+#include 
+

Re: [PATCH] iio: accel: Bugfix to enbale and allow different events to work parallely.

2017-08-09 Thread Harinath Nampally

On Mon, 31 Jul 2017 07:17:38 -0400
Harinath Nampally  wrote:


This driver supports multiple devices like mma8653, mma8652, mma8452, mma8453 
and
fxls8471. Almost all these devices have more than one event. Current driver 
design
hardcodes the event specific information, so only one event can be supported by 
this
driver and current design doesn't have the flexibility to add more events.

This patch fixes by detaching the event related information from chip_info 
struct,
and based on channel type and event direction the corresponding event 
configuration registers
are picked dynamically. Hence multiple events can be handled in read/write 
callbacks.

Changes are thoroughly tested on fxls8471 device on imx6UL Eval board using 
iio_event_monitor user space program.

After this fix both Freefall and Transient events are handled by the driver 
without any conflicts.

Signed-off-by: Harinath Nampally

Just a quick process point before I catch up with the rest of the thread.
Please ensure you put the driver name in the patch title.  We have a lot
of accelerometers these days and doing that will help draw the attention
of people who care!

Sure, I will update it. Thanks.


On 08/09/2017 09:37 AM, Jonathan Cameron wrote:

On Mon, 31 Jul 2017 07:17:38 -0400
Harinath Nampally  wrote:


This driver supports multiple devices like mma8653, mma8652, mma8452, mma8453 
and
fxls8471. Almost all these devices have more than one event. Current driver 
design
hardcodes the event specific information, so only one event can be supported by 
this
driver and current design doesn't have the flexibility to add more events.

This patch fixes by detaching the event related information from chip_info 
struct,
and based on channel type and event direction the corresponding event 
configuration registers
are picked dynamically. Hence multiple events can be handled in read/write 
callbacks.

Changes are thoroughly tested on fxls8471 device on imx6UL Eval board using 
iio_event_monitor user space program.

After this fix both Freefall and Transient events are handled by the driver 
without any conflicts.

Signed-off-by: Harinath Nampally

Just a quick process point before I catch up with the rest of the thread.
Please ensure you put the driver name in the patch title.  We have a lot
of accelerometers these days and doing that will help draw the attention
of people who care!




Re: [PATCH] iio: accel: Bugfix to enbale and allow different events to work parallely.

2017-08-09 Thread Harinath Nampally

On Mon, 31 Jul 2017 07:17:38 -0400
Harinath Nampally  wrote:


This driver supports multiple devices like mma8653, mma8652, mma8452, mma8453 
and
fxls8471. Almost all these devices have more than one event. Current driver 
design
hardcodes the event specific information, so only one event can be supported by 
this
driver and current design doesn't have the flexibility to add more events.

This patch fixes by detaching the event related information from chip_info 
struct,
and based on channel type and event direction the corresponding event 
configuration registers
are picked dynamically. Hence multiple events can be handled in read/write 
callbacks.

Changes are thoroughly tested on fxls8471 device on imx6UL Eval board using 
iio_event_monitor user space program.

After this fix both Freefall and Transient events are handled by the driver 
without any conflicts.

Signed-off-by: Harinath Nampally

Just a quick process point before I catch up with the rest of the thread.
Please ensure you put the driver name in the patch title.  We have a lot
of accelerometers these days and doing that will help draw the attention
of people who care!

Sure, I will update it. Thanks.


On 08/09/2017 09:37 AM, Jonathan Cameron wrote:

On Mon, 31 Jul 2017 07:17:38 -0400
Harinath Nampally  wrote:


This driver supports multiple devices like mma8653, mma8652, mma8452, mma8453 
and
fxls8471. Almost all these devices have more than one event. Current driver 
design
hardcodes the event specific information, so only one event can be supported by 
this
driver and current design doesn't have the flexibility to add more events.

This patch fixes by detaching the event related information from chip_info 
struct,
and based on channel type and event direction the corresponding event 
configuration registers
are picked dynamically. Hence multiple events can be handled in read/write 
callbacks.

Changes are thoroughly tested on fxls8471 device on imx6UL Eval board using 
iio_event_monitor user space program.

After this fix both Freefall and Transient events are handled by the driver 
without any conflicts.

Signed-off-by: Harinath Nampally

Just a quick process point before I catch up with the rest of the thread.
Please ensure you put the driver name in the patch title.  We have a lot
of accelerometers these days and doing that will help draw the attention
of people who care!




Re: [PATCH] f2fs: introduce cur_reserved_blocks in sysfs

2017-08-09 Thread Chao Yu
On 2017/8/8 21:43, Yunlong Song wrote:
> In this patch, we add a new sysfs interface, we can use it to gradually 
> achieve
> the reserved_blocks finally, even when reserved_blocks is initially set over
> user_block_count - total_valid_block_count. This is very useful, especially 
> when
> we upgrade kernel with new reserved_blocks value, but old disk image 
> unluckily has
> user_block_count - total_valid_block_count smaller than the desired 
> reserved_blocks.
> With this patch, f2fs can try its best to reserve space and get close to the
> reserved_blocks, and the current value of achieved reserved_blocks can be 
> shown
> in real time.

Oh, this looks like a soft limitation in quota system, but original
reserved_blocks implementation likes a hard one, so this patch changes the
semantics of reserved_blocks.

Actually, I doubt that it would be hard to reserve all left free space in real
user scenario now, since system user's activation may depend on free space of
data partition due to file creation requirement, so w/o supporting feature of
uid/gid reserved block, soft reservation will block any activation of system
user, such as android.

Thanks,

> 
> Signed-off-by: Yunlong Song 
> ---
>  Documentation/ABI/testing/sysfs-fs-f2fs |  6 ++
>  fs/f2fs/f2fs.h  |  9 +++--
>  fs/f2fs/super.c |  4 +++-
>  fs/f2fs/sysfs.c | 15 ++-
>  4 files changed, 26 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs 
> b/Documentation/ABI/testing/sysfs-fs-f2fs
> index 11b7f4e..bdbb9f3 100644
> --- a/Documentation/ABI/testing/sysfs-fs-f2fs
> +++ b/Documentation/ABI/testing/sysfs-fs-f2fs
> @@ -151,3 +151,9 @@ Date: August 2017
>  Contact: "Jaegeuk Kim" 
>  Description:
>Controls sleep time of GC urgent mode
> +
> +What:/sys/fs/f2fs//cur_reserved_blocks
> +Date:August 2017
> +Contact: "Yunlong Song" 
> +Description:
> +  Shows current reserved blocks in system.
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index cea329f..3b7056f 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1040,6 +1040,7 @@ struct f2fs_sb_info {
>   block_t discard_blks;   /* discard command candidats */
>   block_t last_valid_block_count; /* for recovery */
>   block_t reserved_blocks;/* configurable reserved blocks 
> */
> + block_t cur_reserved_blocks;/* current reserved blocks */
>  
>   u32 s_next_generation;  /* for NFS support */
>  
> @@ -1514,7 +1515,7 @@ static inline int inc_valid_block_count(struct 
> f2fs_sb_info *sbi,
>  
>   spin_lock(>stat_lock);
>   sbi->total_valid_block_count += (block_t)(*count);
> - avail_user_block_count = sbi->user_block_count - sbi->reserved_blocks;
> + avail_user_block_count = sbi->user_block_count - 
> sbi->cur_reserved_blocks;
>   if (unlikely(sbi->total_valid_block_count > avail_user_block_count)) {
>   diff = sbi->total_valid_block_count - avail_user_block_count;
>   *count -= diff;
> @@ -1548,6 +1549,8 @@ static inline void dec_valid_block_count(struct 
> f2fs_sb_info *sbi,
>   f2fs_bug_on(sbi, sbi->total_valid_block_count < (block_t) count);
>   f2fs_bug_on(sbi, inode->i_blocks < sectors);
>   sbi->total_valid_block_count -= (block_t)count;
> + sbi->cur_reserved_blocks = min(sbi->reserved_blocks,
> + 
> sbi->cur_reserved_blocks + count);
>   spin_unlock(>stat_lock);
>   f2fs_i_blocks_write(inode, count, false, true);
>  }
> @@ -1694,7 +1697,7 @@ static inline int inc_valid_node_count(struct 
> f2fs_sb_info *sbi,
>   spin_lock(>stat_lock);
>  
>   valid_block_count = sbi->total_valid_block_count + 1;
> - if (unlikely(valid_block_count + sbi->reserved_blocks >
> + if (unlikely(valid_block_count + sbi->cur_reserved_blocks >
>   sbi->user_block_count)) {
>   spin_unlock(>stat_lock);
>   goto enospc;
> @@ -1737,6 +1740,8 @@ static inline void dec_valid_node_count(struct 
> f2fs_sb_info *sbi,
>  
>   sbi->total_valid_node_count--;
>   sbi->total_valid_block_count--;
> + sbi->cur_reserved_blocks = min(sbi->reserved_blocks,
> + 
> sbi->cur_reserved_blocks + 1);
>  
>   spin_unlock(>stat_lock);
>  
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 4c1bdcb..2934aa2 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -957,7 +957,7 @@ static int f2fs_statfs(struct dentry *dentry, struct 
> kstatfs *buf)
>   buf->f_blocks = total_count - start_count;
>   buf->f_bfree = user_block_count - valid_user_blocks(sbi) + ovp_count;
>   

Re: [PATCH] f2fs: introduce cur_reserved_blocks in sysfs

2017-08-09 Thread Chao Yu
On 2017/8/8 21:43, Yunlong Song wrote:
> In this patch, we add a new sysfs interface, we can use it to gradually 
> achieve
> the reserved_blocks finally, even when reserved_blocks is initially set over
> user_block_count - total_valid_block_count. This is very useful, especially 
> when
> we upgrade kernel with new reserved_blocks value, but old disk image 
> unluckily has
> user_block_count - total_valid_block_count smaller than the desired 
> reserved_blocks.
> With this patch, f2fs can try its best to reserve space and get close to the
> reserved_blocks, and the current value of achieved reserved_blocks can be 
> shown
> in real time.

Oh, this looks like a soft limitation in quota system, but original
reserved_blocks implementation likes a hard one, so this patch changes the
semantics of reserved_blocks.

Actually, I doubt that it would be hard to reserve all left free space in real
user scenario now, since system user's activation may depend on free space of
data partition due to file creation requirement, so w/o supporting feature of
uid/gid reserved block, soft reservation will block any activation of system
user, such as android.

Thanks,

> 
> Signed-off-by: Yunlong Song 
> ---
>  Documentation/ABI/testing/sysfs-fs-f2fs |  6 ++
>  fs/f2fs/f2fs.h  |  9 +++--
>  fs/f2fs/super.c |  4 +++-
>  fs/f2fs/sysfs.c | 15 ++-
>  4 files changed, 26 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs 
> b/Documentation/ABI/testing/sysfs-fs-f2fs
> index 11b7f4e..bdbb9f3 100644
> --- a/Documentation/ABI/testing/sysfs-fs-f2fs
> +++ b/Documentation/ABI/testing/sysfs-fs-f2fs
> @@ -151,3 +151,9 @@ Date: August 2017
>  Contact: "Jaegeuk Kim" 
>  Description:
>Controls sleep time of GC urgent mode
> +
> +What:/sys/fs/f2fs//cur_reserved_blocks
> +Date:August 2017
> +Contact: "Yunlong Song" 
> +Description:
> +  Shows current reserved blocks in system.
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index cea329f..3b7056f 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1040,6 +1040,7 @@ struct f2fs_sb_info {
>   block_t discard_blks;   /* discard command candidats */
>   block_t last_valid_block_count; /* for recovery */
>   block_t reserved_blocks;/* configurable reserved blocks 
> */
> + block_t cur_reserved_blocks;/* current reserved blocks */
>  
>   u32 s_next_generation;  /* for NFS support */
>  
> @@ -1514,7 +1515,7 @@ static inline int inc_valid_block_count(struct 
> f2fs_sb_info *sbi,
>  
>   spin_lock(>stat_lock);
>   sbi->total_valid_block_count += (block_t)(*count);
> - avail_user_block_count = sbi->user_block_count - sbi->reserved_blocks;
> + avail_user_block_count = sbi->user_block_count - 
> sbi->cur_reserved_blocks;
>   if (unlikely(sbi->total_valid_block_count > avail_user_block_count)) {
>   diff = sbi->total_valid_block_count - avail_user_block_count;
>   *count -= diff;
> @@ -1548,6 +1549,8 @@ static inline void dec_valid_block_count(struct 
> f2fs_sb_info *sbi,
>   f2fs_bug_on(sbi, sbi->total_valid_block_count < (block_t) count);
>   f2fs_bug_on(sbi, inode->i_blocks < sectors);
>   sbi->total_valid_block_count -= (block_t)count;
> + sbi->cur_reserved_blocks = min(sbi->reserved_blocks,
> + 
> sbi->cur_reserved_blocks + count);
>   spin_unlock(>stat_lock);
>   f2fs_i_blocks_write(inode, count, false, true);
>  }
> @@ -1694,7 +1697,7 @@ static inline int inc_valid_node_count(struct 
> f2fs_sb_info *sbi,
>   spin_lock(>stat_lock);
>  
>   valid_block_count = sbi->total_valid_block_count + 1;
> - if (unlikely(valid_block_count + sbi->reserved_blocks >
> + if (unlikely(valid_block_count + sbi->cur_reserved_blocks >
>   sbi->user_block_count)) {
>   spin_unlock(>stat_lock);
>   goto enospc;
> @@ -1737,6 +1740,8 @@ static inline void dec_valid_node_count(struct 
> f2fs_sb_info *sbi,
>  
>   sbi->total_valid_node_count--;
>   sbi->total_valid_block_count--;
> + sbi->cur_reserved_blocks = min(sbi->reserved_blocks,
> + 
> sbi->cur_reserved_blocks + 1);
>  
>   spin_unlock(>stat_lock);
>  
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 4c1bdcb..2934aa2 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -957,7 +957,7 @@ static int f2fs_statfs(struct dentry *dentry, struct 
> kstatfs *buf)
>   buf->f_blocks = total_count - start_count;
>   buf->f_bfree = user_block_count - valid_user_blocks(sbi) + ovp_count;
>   buf->f_bavail = user_block_count - valid_user_blocks(sbi) -
> -   

Re: [PATCH RESEND1 00/12] ALSA: vsnd: Add Xen para-virtualized frontend driver

2017-08-09 Thread Takashi Sakamoto

Hi,

On Aug 7 2017 21:22, Oleksandr Andrushchenko wrote:

From: Oleksandr Andrushchenko 

This patch series adds support for Xen [1] para-virtualized
sound frontend driver. It implements the protocol from
include/xen/interface/io/sndif.h with the following limitations:
- mute/unmute is not supported
- get/set volume is not supported
Volume control is not supported for the reason that most of the
use-cases (at the moment) are based on scenarious where
unprivileged OS (e.g. Android, AGL etc) use software mixers.

Both capture and playback are supported.

Thank you,
Oleksandr

Resending because of rebase onto [2] + added missing patch

[1] https://xenproject.org/
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git/log/?h=for-next

Oleksandr Andrushchenko (12):
   ALSA: vsnd: Introduce Xen para-virtualized sound frontend driver
   ALSA: vsnd: Implement driver's probe/remove
   ALSA: vsnd: Implement Xen bus state handling
   ALSA: vsnd: Read sound driver configuration from Xen store
   ALSA: vsnd: Implement Xen event channel handling
   ALSA: vsnd: Implement handling of shared buffers
   ALSA: vsnd: Introduce ALSA virtual sound driver
   ALSA: vsnd: Initialize virtul sound card
   ALSA: vsnd: Add timer for period interrupt emulation
   ALSA: vsnd: Implement ALSA PCM operations
   ALSA: vsnd: Implement communication with backend
   ALSA: vsnd: Introduce Kconfig option to enable Xen PV sound

  sound/drivers/Kconfig |   12 +
  sound/drivers/Makefile|2 +
  sound/drivers/xen-front.c | 2107 +
  3 files changed, 2121 insertions(+)
  create mode 100644 sound/drivers/xen-front.c


For this patchset, I have the same concern which Clemens Ladisch
denoted[1]. If I can understand your explanation about queueing between
Dom0/DomU stuffs, the concern can be described in short words; this
driver works without any synchronization to data transmission by actual
sound hardwares.

In design of ALSA PCM core, drivers are expected to synchronize to
actual hardwares for semi-realtime data transmission. The
synchronization is done by two points:
1) Interrupts to respond events from actual hardwares.
2) Positions of actual data transmission in any serial sound interfaces
   of actual hardwares.

These two points comes from typical designs of actual hardwares, thus
they doesn't come from unfair, unreasonable, intrusive demands from
software side.

In design of typical stuffs on para-virtualization, Dom0 stuffs are hard
to give enough abstraction of sound hardwares in these two points for
DomU stuffs. Especially, it cannot abstract point 2) at all because the
value of position should be accurate against actual time frame, while
there's an overhead for DomU stuffs to read it. When DomU stuffs handles
the value, the value is enough past due to context switches between
Dom0/DomU. Therefore, this driver must rely on point 1) to synchronize
to actual sound hardwares. Typically, drivers configure hardwares to
generate interrupts per period of PCM buffer. This means that this
driver should notify to Dom0 about the value of period size requested
by applications.

In 'include/xen/interface/io/sndif.h', there's no functionalities I
described the above:
1. notifications from DomU to Dom0 about the size of period for
   interrupts from actual hardwares. Or no way from Dom0 to DomU about
   the configured size of the period.
2. notifications of the interrupts from actual hardwares to DomU.

For the reasons, your driver used kernel's timer interface to generate
'pseudo' interrupts for the purpose. However, it depends on Dom0's
abstraction different from sound hardwares and Linux kernel's
abstraction for timer functionality. In this case, gap between 'actual'
interrupts from hardware and the 'pseudo' interrupts from a combination
of several components brings unexpected result on several situations.

I think this is defects of 'sndif' interface in Xen side. I think it
better for you to work in Xen community to improve the above interface
at first, then work for Linux stuffs.


Additionally, in next time, please remind of several points below:
 * When a first patch adds an initial code for drivers, it should
   include entries for Makefile and Kconfig, so that the driver can be
   built even if it's still in an initial shape. Each patch should be
   self-contained and should be in a shape so that developers easily run
   bisecting. In other words, your first patch[2] includes modification
  for Makefile and Kconfig in your last patch[3].
 * When any read-only symbols is added,  it should have 'const'
   qualifier so that the symbol places to .rodata section of ELF
   binaries. For example, in your code, 'alsa_sndif_formats' is such an
   symbol. In recent Linux development, some developers work for
   constifying such symbols. Please remind of their continuous works in
   upstream[4].
 * You can split your driver to several files. In
   

Re: [PATCH RESEND1 00/12] ALSA: vsnd: Add Xen para-virtualized frontend driver

2017-08-09 Thread Takashi Sakamoto

Hi,

On Aug 7 2017 21:22, Oleksandr Andrushchenko wrote:

From: Oleksandr Andrushchenko 

This patch series adds support for Xen [1] para-virtualized
sound frontend driver. It implements the protocol from
include/xen/interface/io/sndif.h with the following limitations:
- mute/unmute is not supported
- get/set volume is not supported
Volume control is not supported for the reason that most of the
use-cases (at the moment) are based on scenarious where
unprivileged OS (e.g. Android, AGL etc) use software mixers.

Both capture and playback are supported.

Thank you,
Oleksandr

Resending because of rebase onto [2] + added missing patch

[1] https://xenproject.org/
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git/log/?h=for-next

Oleksandr Andrushchenko (12):
   ALSA: vsnd: Introduce Xen para-virtualized sound frontend driver
   ALSA: vsnd: Implement driver's probe/remove
   ALSA: vsnd: Implement Xen bus state handling
   ALSA: vsnd: Read sound driver configuration from Xen store
   ALSA: vsnd: Implement Xen event channel handling
   ALSA: vsnd: Implement handling of shared buffers
   ALSA: vsnd: Introduce ALSA virtual sound driver
   ALSA: vsnd: Initialize virtul sound card
   ALSA: vsnd: Add timer for period interrupt emulation
   ALSA: vsnd: Implement ALSA PCM operations
   ALSA: vsnd: Implement communication with backend
   ALSA: vsnd: Introduce Kconfig option to enable Xen PV sound

  sound/drivers/Kconfig |   12 +
  sound/drivers/Makefile|2 +
  sound/drivers/xen-front.c | 2107 +
  3 files changed, 2121 insertions(+)
  create mode 100644 sound/drivers/xen-front.c


For this patchset, I have the same concern which Clemens Ladisch
denoted[1]. If I can understand your explanation about queueing between
Dom0/DomU stuffs, the concern can be described in short words; this
driver works without any synchronization to data transmission by actual
sound hardwares.

In design of ALSA PCM core, drivers are expected to synchronize to
actual hardwares for semi-realtime data transmission. The
synchronization is done by two points:
1) Interrupts to respond events from actual hardwares.
2) Positions of actual data transmission in any serial sound interfaces
   of actual hardwares.

These two points comes from typical designs of actual hardwares, thus
they doesn't come from unfair, unreasonable, intrusive demands from
software side.

In design of typical stuffs on para-virtualization, Dom0 stuffs are hard
to give enough abstraction of sound hardwares in these two points for
DomU stuffs. Especially, it cannot abstract point 2) at all because the
value of position should be accurate against actual time frame, while
there's an overhead for DomU stuffs to read it. When DomU stuffs handles
the value, the value is enough past due to context switches between
Dom0/DomU. Therefore, this driver must rely on point 1) to synchronize
to actual sound hardwares. Typically, drivers configure hardwares to
generate interrupts per period of PCM buffer. This means that this
driver should notify to Dom0 about the value of period size requested
by applications.

In 'include/xen/interface/io/sndif.h', there's no functionalities I
described the above:
1. notifications from DomU to Dom0 about the size of period for
   interrupts from actual hardwares. Or no way from Dom0 to DomU about
   the configured size of the period.
2. notifications of the interrupts from actual hardwares to DomU.

For the reasons, your driver used kernel's timer interface to generate
'pseudo' interrupts for the purpose. However, it depends on Dom0's
abstraction different from sound hardwares and Linux kernel's
abstraction for timer functionality. In this case, gap between 'actual'
interrupts from hardware and the 'pseudo' interrupts from a combination
of several components brings unexpected result on several situations.

I think this is defects of 'sndif' interface in Xen side. I think it
better for you to work in Xen community to improve the above interface
at first, then work for Linux stuffs.


Additionally, in next time, please remind of several points below:
 * When a first patch adds an initial code for drivers, it should
   include entries for Makefile and Kconfig, so that the driver can be
   built even if it's still in an initial shape. Each patch should be
   self-contained and should be in a shape so that developers easily run
   bisecting. In other words, your first patch[2] includes modification
  for Makefile and Kconfig in your last patch[3].
 * When any read-only symbols is added,  it should have 'const'
   qualifier so that the symbol places to .rodata section of ELF
   binaries. For example, in your code, 'alsa_sndif_formats' is such an
   symbol. In recent Linux development, some developers work for
   constifying such symbols. Please remind of their continuous works in
   upstream[4].
 * You can split your driver to several files. In
   'include/xen/interface/io/sndif.h', Dom0 

Re: [PATCH] usb: gadget: udc: renesas_usb3: fix error return code in renesas_usb3_probe()

2017-08-09 Thread Gustavo A. R. Silva

Hi Yoshihiro,

On 08/09/2017 06:44 AM, Yoshihiro Shimoda wrote:

Hi Gustavo,

Thank you for the patch!



I'm glad to help :)


-Original Message-
From: Gustavo A. R. Silva
Sent: Wednesday, August 9, 2017 7:35 AM

platform_get_irq() returns an error code, but the renesas_usb3 driver
ignores it and always returns -ENODEV. This is not correct and,
prevents -EPROBE_DEFER from being propagated properly.


Thank you for the point. I got it.


Also, notice that platform_get_irq() no longer returns 0 on error:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e330b9a6bb35dc7097a4f02cb1ae7b6f96df92
af


I don't think this explanation needs.
After this is removed,

Acked-by: Yoshihiro Shimoda 



Thank you
--
Gustavo A. R. Silva


Best regards,
Yoshihiro Shimoda


Print error message and propagate the return value of platform_get_irq
on failure.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/usb/gadget/udc/renesas_usb3.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/gadget/udc/renesas_usb3.c 
b/drivers/usb/gadget/udc/renesas_usb3.c
index e1de8fe..616d053 100644
--- a/drivers/usb/gadget/udc/renesas_usb3.c
+++ b/drivers/usb/gadget/udc/renesas_usb3.c
@@ -2468,8 +2468,10 @@ static int renesas_usb3_probe(struct platform_device 
*pdev)
priv = match->data;

irq = platform_get_irq(pdev, 0);
-   if (irq < 0)
-   return -ENODEV;
+   if (irq < 0) {
+   dev_err(>dev, "Failed to get IRQ: %d\n", irq);
+   return irq;
+   }

usb3 = devm_kzalloc(>dev, sizeof(*usb3), GFP_KERNEL);
if (!usb3)
--
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




Re: [PATCH] usb: gadget: udc: renesas_usb3: fix error return code in renesas_usb3_probe()

2017-08-09 Thread Gustavo A. R. Silva

Hi Yoshihiro,

On 08/09/2017 06:44 AM, Yoshihiro Shimoda wrote:

Hi Gustavo,

Thank you for the patch!



I'm glad to help :)


-Original Message-
From: Gustavo A. R. Silva
Sent: Wednesday, August 9, 2017 7:35 AM

platform_get_irq() returns an error code, but the renesas_usb3 driver
ignores it and always returns -ENODEV. This is not correct and,
prevents -EPROBE_DEFER from being propagated properly.


Thank you for the point. I got it.


Also, notice that platform_get_irq() no longer returns 0 on error:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e330b9a6bb35dc7097a4f02cb1ae7b6f96df92
af


I don't think this explanation needs.
After this is removed,

Acked-by: Yoshihiro Shimoda 



Thank you
--
Gustavo A. R. Silva


Best regards,
Yoshihiro Shimoda


Print error message and propagate the return value of platform_get_irq
on failure.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/usb/gadget/udc/renesas_usb3.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/gadget/udc/renesas_usb3.c 
b/drivers/usb/gadget/udc/renesas_usb3.c
index e1de8fe..616d053 100644
--- a/drivers/usb/gadget/udc/renesas_usb3.c
+++ b/drivers/usb/gadget/udc/renesas_usb3.c
@@ -2468,8 +2468,10 @@ static int renesas_usb3_probe(struct platform_device 
*pdev)
priv = match->data;

irq = platform_get_irq(pdev, 0);
-   if (irq < 0)
-   return -ENODEV;
+   if (irq < 0) {
+   dev_err(>dev, "Failed to get IRQ: %d\n", irq);
+   return irq;
+   }

usb3 = devm_kzalloc(>dev, sizeof(*usb3), GFP_KERNEL);
if (!usb3)
--
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




  1   2   3   4   5   6   7   8   9   10   >