date:20180415

KASAN: use-after-free Read in tipc_nametbl_stop

2018-04-15 Thread syzbot


Hello,

syzbot hit the following crash on net-next commit
5d1365940a68dd57b031b6e3c07d7d451cd69daf (Thu Apr 12 18:09:05 2018 +)
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=d64b64afc55660106556


So far this crash happened 5 times on net-next, upstream.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=6319968803094528
syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=6099825221173248
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=4953018151731200
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-5947642240294114534

compiler: gcc (GCC) 8.0.1 20180413 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+d64b64afc55660106...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

Failed to remove local publication {0,0,0}/20641
IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
==
BUG: KASAN: use-after-free in tipc_service_delete net/tipc/name_table.c:751  
[inline]
BUG: KASAN: use-after-free in tipc_nametbl_stop+0x94e/0xd70  
net/tipc/name_table.c:780

Read of size 8 at addr 8801c4c25130 by task kworker/u4:2/30

CPU: 0 PID: 30 Comm: kworker/u4:2 Not tainted 4.16.0+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Workqueue: netns cleanup_net
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 print_address_description+0x6c/0x20b mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
 tipc_service_delete net/tipc/name_table.c:751 [inline]
 tipc_nametbl_stop+0x94e/0xd70 net/tipc/name_table.c:780
 tipc_exit_net+0x2d/0x40 net/tipc/core.c:103
 ops_exit_list.isra.7+0xb0/0x160 net/core/net_namespace.c:152
 cleanup_net+0x51d/0xb20 net/core/net_namespace.c:523
 process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
 worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
 kthread+0x345/0x410 kernel/kthread.c:238
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411

Allocated by task 4535:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
 kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620
 kmalloc include/linux/slab.h:512 [inline]
 kzalloc include/linux/slab.h:701 [inline]
 tipc_service_create_range net/tipc/name_table.c:183 [inline]
 tipc_service_insert_publ net/tipc/name_table.c:207 [inline]
 tipc_nametbl_insert_publ+0x569/0x1910 net/tipc/name_table.c:371
 tipc_nametbl_publish+0x6c3/0xba0 net/tipc/name_table.c:618
 tipc_sk_publish+0x22a/0x510 net/tipc/socket.c:2604
 tipc_bind+0x206/0x330 net/tipc/socket.c:647
 __sys_bind+0x331/0x440 net/socket.c:1484
 SYSC_bind net/socket.c:1495 [inline]
 SyS_bind+0x24/0x30 net/socket.c:1493
 do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7

Freed by task 30:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
 __cache_free mm/slab.c:3498 [inline]
 kfree+0xd9/0x260 mm/slab.c:3813
 tipc_service_remove_publ.isra.8+0x909/0xc30 net/tipc/name_table.c:283
 tipc_service_delete net/tipc/name_table.c:753 [inline]
 tipc_nametbl_stop+0x746/0xd70 net/tipc/name_table.c:780
 tipc_exit_net+0x2d/0x40 net/tipc/core.c:103
 ops_exit_list.isra.7+0xb0/0x160 net/core/net_namespace.c:152
 cleanup_net+0x51d/0xb20 net/core/net_namespace.c:523
 process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
 worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
 kthread+0x345/0x410 kernel/kthread.c:238
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411

The buggy address belongs to the object at 8801c4c25100
 which belongs to the cache kmalloc-64 of size 64
The buggy address is located 48 bytes inside of
 64-byte region [8801c4c25100, 8801c4c25140)
The buggy address belongs to the page:
page:ea0007130940 count:1 mapcount:0 mapping:8801c4c25000 index:0x0
flags: 0x2fffc000100(slab)
raw: 02fffc000100 8801c4c25000  00010020
raw: ea0006ccf860 ea00070840a0 8801dac00340 
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 8801c4c25000: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
 8801c4c25080: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc

8801c4c25100: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc

 ^

KASAN: use-after-free Read in tipc_nametbl_stop

2018-04-15 Thread syzbot


Hello,

syzbot hit the following crash on net-next commit
5d1365940a68dd57b031b6e3c07d7d451cd69daf (Thu Apr 12 18:09:05 2018 +)
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=d64b64afc55660106556


So far this crash happened 5 times on net-next, upstream.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=6319968803094528
syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=6099825221173248
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=4953018151731200
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-5947642240294114534

compiler: gcc (GCC) 8.0.1 20180413 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+d64b64afc55660106...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

Failed to remove local publication {0,0,0}/20641
IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
==
BUG: KASAN: use-after-free in tipc_service_delete net/tipc/name_table.c:751  
[inline]
BUG: KASAN: use-after-free in tipc_nametbl_stop+0x94e/0xd70  
net/tipc/name_table.c:780

Read of size 8 at addr 8801c4c25130 by task kworker/u4:2/30

CPU: 0 PID: 30 Comm: kworker/u4:2 Not tainted 4.16.0+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Workqueue: netns cleanup_net
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 print_address_description+0x6c/0x20b mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
 tipc_service_delete net/tipc/name_table.c:751 [inline]
 tipc_nametbl_stop+0x94e/0xd70 net/tipc/name_table.c:780
 tipc_exit_net+0x2d/0x40 net/tipc/core.c:103
 ops_exit_list.isra.7+0xb0/0x160 net/core/net_namespace.c:152
 cleanup_net+0x51d/0xb20 net/core/net_namespace.c:523
 process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
 worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
 kthread+0x345/0x410 kernel/kthread.c:238
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411

Allocated by task 4535:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
 kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620
 kmalloc include/linux/slab.h:512 [inline]
 kzalloc include/linux/slab.h:701 [inline]
 tipc_service_create_range net/tipc/name_table.c:183 [inline]
 tipc_service_insert_publ net/tipc/name_table.c:207 [inline]
 tipc_nametbl_insert_publ+0x569/0x1910 net/tipc/name_table.c:371
 tipc_nametbl_publish+0x6c3/0xba0 net/tipc/name_table.c:618
 tipc_sk_publish+0x22a/0x510 net/tipc/socket.c:2604
 tipc_bind+0x206/0x330 net/tipc/socket.c:647
 __sys_bind+0x331/0x440 net/socket.c:1484
 SYSC_bind net/socket.c:1495 [inline]
 SyS_bind+0x24/0x30 net/socket.c:1493
 do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7

Freed by task 30:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
 __cache_free mm/slab.c:3498 [inline]
 kfree+0xd9/0x260 mm/slab.c:3813
 tipc_service_remove_publ.isra.8+0x909/0xc30 net/tipc/name_table.c:283
 tipc_service_delete net/tipc/name_table.c:753 [inline]
 tipc_nametbl_stop+0x746/0xd70 net/tipc/name_table.c:780
 tipc_exit_net+0x2d/0x40 net/tipc/core.c:103
 ops_exit_list.isra.7+0xb0/0x160 net/core/net_namespace.c:152
 cleanup_net+0x51d/0xb20 net/core/net_namespace.c:523
 process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
 worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
 kthread+0x345/0x410 kernel/kthread.c:238
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411

The buggy address belongs to the object at 8801c4c25100
 which belongs to the cache kmalloc-64 of size 64
The buggy address is located 48 bytes inside of
 64-byte region [8801c4c25100, 8801c4c25140)
The buggy address belongs to the page:
page:ea0007130940 count:1 mapcount:0 mapping:8801c4c25000 index:0x0
flags: 0x2fffc000100(slab)
raw: 02fffc000100 8801c4c25000  00010020
raw: ea0006ccf860 ea00070840a0 8801dac00340 
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 8801c4c25000: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
 8801c4c25080: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc

8801c4c25100: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc

 ^

Wrong module .text address in 4.16.0

2018-04-15 Thread Thomas-Mich Richter

I just installed 4.16.0 and discovered the module .text address is
wrong. It happens on s390 and x86 platforms. I have not tested others.

Here is the issue, I have used module qeth_l2 on s390 which is the
ethernet device driver:

root@s35lp76 ~]# lsmod
Module  Size  Used by
qeth_l294208  1
...

[root@s35lp76 ~]# cat /proc/modules | egrep '^qeth_l2'
qeth_l2 94208 1 - Live 0x03ff80401000   < This is the correct address 
in memory
[root@s35lp76 ~]# cat /sys/module/qeth_l2/sections/.text 
0x18ea8363  < This is the wrong address
[root@s35lp76 ~]# 

File /sys/module/qeth_l2/sections/.text displays a very strange
address which is definitely wrong. It should be something like
0x03ff80401xxx.

Same on x86.

I have checked file kernel/module.c function add_sect_attrs()
and it calls module_sect_show() when the sysfs file is read.
And module_sect_show() uses 

  sprintf(buf, "0x%pK\n", (void *)sattr->address);

and my sysctl setting should be correct:
[root@s35lp76 linux]# sysctl -a | fgrep kernel.kptr_restrict
kernel.kptr_restrict = 0
[root@s35lp76 linux]#

I wonder if somebody else has seen this issue?
Ideas how to fix this?

Thanks
-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Wrong module .text address in 4.16.0

2018-04-15 Thread Thomas-Mich Richter

I just installed 4.16.0 and discovered the module .text address is
wrong. It happens on s390 and x86 platforms. I have not tested others.

Here is the issue, I have used module qeth_l2 on s390 which is the
ethernet device driver:

root@s35lp76 ~]# lsmod
Module  Size  Used by
qeth_l294208  1
...

[root@s35lp76 ~]# cat /proc/modules | egrep '^qeth_l2'
qeth_l2 94208 1 - Live 0x03ff80401000   < This is the correct address 
in memory
[root@s35lp76 ~]# cat /sys/module/qeth_l2/sections/.text 
0x18ea8363  < This is the wrong address
[root@s35lp76 ~]# 

File /sys/module/qeth_l2/sections/.text displays a very strange
address which is definitely wrong. It should be something like
0x03ff80401xxx.

Same on x86.

I have checked file kernel/module.c function add_sect_attrs()
and it calls module_sect_show() when the sysfs file is read.
And module_sect_show() uses 

  sprintf(buf, "0x%pK\n", (void *)sattr->address);

and my sysctl setting should be correct:
[root@s35lp76 linux]# sysctl -a | fgrep kernel.kptr_restrict
kernel.kptr_restrict = 0
[root@s35lp76 linux]#

I wonder if somebody else has seen this issue?
Ideas how to fix this?

Thanks
-- 
Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany
--
Vorsitzende des Aufsichtsrats: Martina Koederitz 
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 
243294

Re: [PATCH v13 6/6] PCI/DPC: Do not do recovery for hotplug enabled system

2018-04-15 Thread poza


On 2018-04-16 11:03, p...@codeaurora.org wrote:

On 2018-04-16 08:47, Bjorn Helgaas wrote:

On Sat, Apr 14, 2018 at 11:53:17AM -0400, Sinan Kaya wrote:


You indicated that you want to unify the AER and DPC behavior. Let's
settle on what we want to do one more time. We have been going forth
and back on the direction.


My thinking is that as much as possible, similar events should be
handled similarly, whether the mechanism is AER, DPC, EEH, etc.
Ideally, drivers shouldn't have to be aware of which mechanism is in
use.

Error recovery includes conventional PCI as well, but right now I
think we're only concerned with PCIe.  The following error types are
from PCIe r4.0, sec 6.2.2:

  ERR_COR
Corrected by hardware with no software intervention.  Software
involved for logging only.

Handled by AER via pci_error_handlers; DPC is never involved.

Link is unaffected.

  ERR_NONFATAL
A transaction is unreliable but the link is fully functional.

If DPC is not supported, handled by AER via pci_error_handlers and
the link is unaffected.

If DPC supported, handled by DPC (because we set
PCI_EXP_DPC_CTL_EN_NONFATAL) via remove/re-enumerate.

  ERR_FATAL
The link is unreliable.

If DPC is not supported, handled by AER via pci_error_handlers and
the link is reset.

If DPC supported, handled by DPC via remove/re-enumerate.

It doesn't seem right to me that we handle both ERR_NONFATAL and
ERR_FATAL events differently if we happen to have DPC support in a
switch.

Maybe we should consider triggering DPC only on ERR_FATAL?  That would
keep DPC out of the ERR_NONFATAL cases.

For ERR_FATAL, maybe we should bite the bullet and use
remove/re-enumerate for AER as well as for DPC.  That would be painful
for higher-level software, but if we're willing to accept that pain
for new systems that support DPC, maybe life would be better overall
if it worked the same way on systems without DPC?

Bjorn


This had crossed my mind when I first looked at the code.
DPC is getting triggered for both ERR_NONFATAL and ERR_FATAL case.
I thought the primary purpose of DPC to recover fatal errors, by
triggering HW recovery.
but what if some platform wants to handle both FATAL and NON_FATAL with 
DPC ?


As you said AER FATAL cases and DPC FATAL cases should be handled 
similarly.

e.g. remove/re-enumerate the devices.

while NON_FATAL case; only AER would come into picture.
if some platform would like to handle DPC NON_FATAL then it should
follow AER NON_FATAL path  (where it does not do remove/re-enumerate)

And the case where hotplug is enabled, remove/re-enumerate more sense
in case of ERR_FATAL.
And the case where hotplug is disabled, only re-enumeration is
required. (no need to remove the devices)
but then do we need to handle this case specifically, what is the harm
in removing the devices in all the cases followed by re-enumerate ?


To Clarify the last line, what I meant here was, in case of ERR_FATAL we 
can always remove/re-enumerate the devices irrespective of hotplug is 
enabled or not.


and in case of ERR_NONFATAL, DPC will follow AER path (where it just 
tries to recover)
although I am not very sure that how to handle ERR_NONFATAL case if 
hotplug is enabled. Because as Keith suggested device might have been 
changed run-time.




Regards,
Oza.

Re: [PATCH v13 6/6] PCI/DPC: Do not do recovery for hotplug enabled system

2018-04-15 Thread poza


On 2018-04-16 11:03, p...@codeaurora.org wrote:

On 2018-04-16 08:47, Bjorn Helgaas wrote:

On Sat, Apr 14, 2018 at 11:53:17AM -0400, Sinan Kaya wrote:


You indicated that you want to unify the AER and DPC behavior. Let's
settle on what we want to do one more time. We have been going forth
and back on the direction.


My thinking is that as much as possible, similar events should be
handled similarly, whether the mechanism is AER, DPC, EEH, etc.
Ideally, drivers shouldn't have to be aware of which mechanism is in
use.

Error recovery includes conventional PCI as well, but right now I
think we're only concerned with PCIe.  The following error types are
from PCIe r4.0, sec 6.2.2:

  ERR_COR
Corrected by hardware with no software intervention.  Software
involved for logging only.

Handled by AER via pci_error_handlers; DPC is never involved.

Link is unaffected.

  ERR_NONFATAL
A transaction is unreliable but the link is fully functional.

If DPC is not supported, handled by AER via pci_error_handlers and
the link is unaffected.

If DPC supported, handled by DPC (because we set
PCI_EXP_DPC_CTL_EN_NONFATAL) via remove/re-enumerate.

  ERR_FATAL
The link is unreliable.

If DPC is not supported, handled by AER via pci_error_handlers and
the link is reset.

If DPC supported, handled by DPC via remove/re-enumerate.

It doesn't seem right to me that we handle both ERR_NONFATAL and
ERR_FATAL events differently if we happen to have DPC support in a
switch.

Maybe we should consider triggering DPC only on ERR_FATAL?  That would
keep DPC out of the ERR_NONFATAL cases.

For ERR_FATAL, maybe we should bite the bullet and use
remove/re-enumerate for AER as well as for DPC.  That would be painful
for higher-level software, but if we're willing to accept that pain
for new systems that support DPC, maybe life would be better overall
if it worked the same way on systems without DPC?

Bjorn


This had crossed my mind when I first looked at the code.
DPC is getting triggered for both ERR_NONFATAL and ERR_FATAL case.
I thought the primary purpose of DPC to recover fatal errors, by
triggering HW recovery.
but what if some platform wants to handle both FATAL and NON_FATAL with 
DPC ?


As you said AER FATAL cases and DPC FATAL cases should be handled 
similarly.

e.g. remove/re-enumerate the devices.

while NON_FATAL case; only AER would come into picture.
if some platform would like to handle DPC NON_FATAL then it should
follow AER NON_FATAL path  (where it does not do remove/re-enumerate)

And the case where hotplug is enabled, remove/re-enumerate more sense
in case of ERR_FATAL.
And the case where hotplug is disabled, only re-enumeration is
required. (no need to remove the devices)
but then do we need to handle this case specifically, what is the harm
in removing the devices in all the cases followed by re-enumerate ?


To Clarify the last line, what I meant here was, in case of ERR_FATAL we 
can always remove/re-enumerate the devices irrespective of hotplug is 
enabled or not.


and in case of ERR_NONFATAL, DPC will follow AER path (where it just 
tries to recover)
although I am not very sure that how to handle ERR_NONFATAL case if 
hotplug is enabled. Because as Keith suggested device might have been 
changed run-time.




Regards,
Oza.

Re: [PATCH v13 6/6] PCI/DPC: Do not do recovery for hotplug enabled system

2018-04-15 Thread poza


On 2018-04-16 08:47, Bjorn Helgaas wrote:

On Sat, Apr 14, 2018 at 11:53:17AM -0400, Sinan Kaya wrote:


You indicated that you want to unify the AER and DPC behavior. Let's
settle on what we want to do one more time. We have been going forth
and back on the direction.


My thinking is that as much as possible, similar events should be
handled similarly, whether the mechanism is AER, DPC, EEH, etc.
Ideally, drivers shouldn't have to be aware of which mechanism is in
use.

Error recovery includes conventional PCI as well, but right now I
think we're only concerned with PCIe.  The following error types are
from PCIe r4.0, sec 6.2.2:

  ERR_COR
Corrected by hardware with no software intervention.  Software
involved for logging only.

Handled by AER via pci_error_handlers; DPC is never involved.

Link is unaffected.

  ERR_NONFATAL
A transaction is unreliable but the link is fully functional.

If DPC is not supported, handled by AER via pci_error_handlers and
the link is unaffected.

If DPC supported, handled by DPC (because we set
PCI_EXP_DPC_CTL_EN_NONFATAL) via remove/re-enumerate.

  ERR_FATAL
The link is unreliable.

If DPC is not supported, handled by AER via pci_error_handlers and
the link is reset.

If DPC supported, handled by DPC via remove/re-enumerate.

It doesn't seem right to me that we handle both ERR_NONFATAL and
ERR_FATAL events differently if we happen to have DPC support in a
switch.

Maybe we should consider triggering DPC only on ERR_FATAL?  That would
keep DPC out of the ERR_NONFATAL cases.

For ERR_FATAL, maybe we should bite the bullet and use
remove/re-enumerate for AER as well as for DPC.  That would be painful
for higher-level software, but if we're willing to accept that pain
for new systems that support DPC, maybe life would be better overall
if it worked the same way on systems without DPC?

Bjorn


This had crossed my mind when I first looked at the code.
DPC is getting triggered for both ERR_NONFATAL and ERR_FATAL case.
I thought the primary purpose of DPC to recover fatal errors, by 
triggering HW recovery.
but what if some platform wants to handle both FATAL and NON_FATAL with 
DPC ?


As you said AER FATAL cases and DPC FATAL cases should be handled 
similarly.

e.g. remove/re-enumerate the devices.

while NON_FATAL case; only AER would come into picture.
if some platform would like to handle DPC NON_FATAL then it should 
follow AER NON_FATAL path  (where it does not do remove/re-enumerate)


And the case where hotplug is enabled, remove/re-enumerate more sense in 
case of ERR_FATAL.
And the case where hotplug is disabled, only re-enumeration is required. 
(no need to remove the devices)
but then do we need to handle this case specifically, what is the harm 
in removing the devices in all the cases followed by re-enumerate ?


Regards,
Oza.

linux-next: Tree for Apr 16

2018-04-15 Thread Stephen Rothwell

Hi all,

Changes since 20180413:

The bpf tree gained a build failure for which I applied a patch.

Non-merge commits (relative to Linus' tree): 379
 366 files changed, 7652 insertions(+), 4560 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 258 trees (counting Linus' and 44 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (60cc43fc8884 Linux 4.17-rc1)
Merging fixes/master (147a89bc71e7 Merge tag 'kconfig-v4.17' of 
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild)
Merging kbuild-current/fixes (28913ee8191a netfilter: nf_nat_snmp_basic: add 
correct dependency to Makefile)
Merging arc-current/for-curr (661e50bc8532 Linux 4.16-rc4)
Merging arm-current/fixes (04552a693d60 ARM: kexec: record parent context 
registers for non-crash CPUs)
Merging arm64-fixes/for-next/fixes (e21da1c99200 arm64: Relax 
ARM_SMCCC_ARCH_WORKAROUND_1 discovery)
Merging m68k-current/for-linus (ecd685580c8f m68k/mac: Remove bogus "FIXME" 
comment)
Merging powerpc-fixes/fixes (81b654c27391 powerpc/64s: Fix CPU_FTRS_ALWAYS vs 
DT CPU features)
Merging sparc/master (17dec0a94915 Merge branch 'userns-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (c246fd333f84 filter.txt: update 'tools/net/' to 
'tools/bpf/')
Merging bpf/master (700475af1bb5 Merge branch 'bpf-sockmap-fixes')
Applying: fix for "bpf: sockmap, map_release does not hold refcnt for pinned 
maps"
Merging ipsec/master (4b66af2d6356 af_key: Always verify length of provided 
sadb_key)
Merging netfilter/master (cf43ae63c024 netfilter: xt_connmark: Add bit mapping 
for bit-shift operation.)
Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook 
mask only if set)
Merging wireless-drivers/master (77e30e10ee28 iwlwifi: mvm: query regdb for wmm 
rule if needed)
Merging mac80211/master (b5dbc28762fd Merge tag 'kbuild-fixes-v4.16-3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild)
Merging rdma-fixes/for-rc (84652aefb347 RDMA/ucma: Introduce safer 
rdma_addr_size() variants)
Merging sound-current/for-linus (7ecb46e9ee9a ALSA: line6: Use correct endpoint 
type for midi output)
Merging pci-current/for-linus (adf58458bcb2 PCI: Remove messages about 
reassigning resources)
Merging driver-core.current/driver-core-linus (38c23685b273 Merge tag 
'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc)
Merging tty.current/tty-linus (38c23685b273 Merge tag 'armsoc-drivers' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc)
Merging usb.current/usb-linus (38c23685b273 Merge tag 'armsoc-drivers' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc)
Merging usb-gadget-fixes/fixes (c6ba5084ce0d usb: gadget: udc: renesas_usb3: 
add binging for r8a77965)
Merging usb-serial-fixes/usb-linus (86d71233b615 USB: serial: ftdi_sio: add 
support for Harman FirmwareHubEmulator)
Merging usb-chipidea-fixes/ci-for-usb-stable (964728f9f407 USB: chipidea: msm: 
fix ulpi-node lookup)
Merging phy/fixes (59fba0869aca phy: qcom-ufs: add MODULE_LICENSE tag)
Merging staging.current/staging-linus (df34df483a97 Merge tag 
'staging-4.17-rc1' of

Re: [PATCH v13 6/6] PCI/DPC: Do not do recovery for hotplug enabled system

2018-04-15 Thread poza


On 2018-04-16 08:47, Bjorn Helgaas wrote:

On Sat, Apr 14, 2018 at 11:53:17AM -0400, Sinan Kaya wrote:


You indicated that you want to unify the AER and DPC behavior. Let's
settle on what we want to do one more time. We have been going forth
and back on the direction.


My thinking is that as much as possible, similar events should be
handled similarly, whether the mechanism is AER, DPC, EEH, etc.
Ideally, drivers shouldn't have to be aware of which mechanism is in
use.

Error recovery includes conventional PCI as well, but right now I
think we're only concerned with PCIe.  The following error types are
from PCIe r4.0, sec 6.2.2:

  ERR_COR
Corrected by hardware with no software intervention.  Software
involved for logging only.

Handled by AER via pci_error_handlers; DPC is never involved.

Link is unaffected.

  ERR_NONFATAL
A transaction is unreliable but the link is fully functional.

If DPC is not supported, handled by AER via pci_error_handlers and
the link is unaffected.

If DPC supported, handled by DPC (because we set
PCI_EXP_DPC_CTL_EN_NONFATAL) via remove/re-enumerate.

  ERR_FATAL
The link is unreliable.

If DPC is not supported, handled by AER via pci_error_handlers and
the link is reset.

If DPC supported, handled by DPC via remove/re-enumerate.

It doesn't seem right to me that we handle both ERR_NONFATAL and
ERR_FATAL events differently if we happen to have DPC support in a
switch.

Maybe we should consider triggering DPC only on ERR_FATAL?  That would
keep DPC out of the ERR_NONFATAL cases.

For ERR_FATAL, maybe we should bite the bullet and use
remove/re-enumerate for AER as well as for DPC.  That would be painful
for higher-level software, but if we're willing to accept that pain
for new systems that support DPC, maybe life would be better overall
if it worked the same way on systems without DPC?

Bjorn


This had crossed my mind when I first looked at the code.
DPC is getting triggered for both ERR_NONFATAL and ERR_FATAL case.
I thought the primary purpose of DPC to recover fatal errors, by 
triggering HW recovery.
but what if some platform wants to handle both FATAL and NON_FATAL with 
DPC ?


As you said AER FATAL cases and DPC FATAL cases should be handled 
similarly.

e.g. remove/re-enumerate the devices.

while NON_FATAL case; only AER would come into picture.
if some platform would like to handle DPC NON_FATAL then it should 
follow AER NON_FATAL path  (where it does not do remove/re-enumerate)


And the case where hotplug is enabled, remove/re-enumerate more sense in 
case of ERR_FATAL.
And the case where hotplug is disabled, only re-enumeration is required. 
(no need to remove the devices)
but then do we need to handle this case specifically, what is the harm 
in removing the devices in all the cases followed by re-enumerate ?


Regards,
Oza.

linux-next: Tree for Apr 16

2018-04-15 Thread Stephen Rothwell

Hi all,

Changes since 20180413:

The bpf tree gained a build failure for which I applied a patch.

Non-merge commits (relative to Linus' tree): 379
 366 files changed, 7652 insertions(+), 4560 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 258 trees (counting Linus' and 44 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (60cc43fc8884 Linux 4.17-rc1)
Merging fixes/master (147a89bc71e7 Merge tag 'kconfig-v4.17' of 
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild)
Merging kbuild-current/fixes (28913ee8191a netfilter: nf_nat_snmp_basic: add 
correct dependency to Makefile)
Merging arc-current/for-curr (661e50bc8532 Linux 4.16-rc4)
Merging arm-current/fixes (04552a693d60 ARM: kexec: record parent context 
registers for non-crash CPUs)
Merging arm64-fixes/for-next/fixes (e21da1c99200 arm64: Relax 
ARM_SMCCC_ARCH_WORKAROUND_1 discovery)
Merging m68k-current/for-linus (ecd685580c8f m68k/mac: Remove bogus "FIXME" 
comment)
Merging powerpc-fixes/fixes (81b654c27391 powerpc/64s: Fix CPU_FTRS_ALWAYS vs 
DT CPU features)
Merging sparc/master (17dec0a94915 Merge branch 'userns-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (c246fd333f84 filter.txt: update 'tools/net/' to 
'tools/bpf/')
Merging bpf/master (700475af1bb5 Merge branch 'bpf-sockmap-fixes')
Applying: fix for "bpf: sockmap, map_release does not hold refcnt for pinned 
maps"
Merging ipsec/master (4b66af2d6356 af_key: Always verify length of provided 
sadb_key)
Merging netfilter/master (cf43ae63c024 netfilter: xt_connmark: Add bit mapping 
for bit-shift operation.)
Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook 
mask only if set)
Merging wireless-drivers/master (77e30e10ee28 iwlwifi: mvm: query regdb for wmm 
rule if needed)
Merging mac80211/master (b5dbc28762fd Merge tag 'kbuild-fixes-v4.16-3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild)
Merging rdma-fixes/for-rc (84652aefb347 RDMA/ucma: Introduce safer 
rdma_addr_size() variants)
Merging sound-current/for-linus (7ecb46e9ee9a ALSA: line6: Use correct endpoint 
type for midi output)
Merging pci-current/for-linus (adf58458bcb2 PCI: Remove messages about 
reassigning resources)
Merging driver-core.current/driver-core-linus (38c23685b273 Merge tag 
'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc)
Merging tty.current/tty-linus (38c23685b273 Merge tag 'armsoc-drivers' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc)
Merging usb.current/usb-linus (38c23685b273 Merge tag 'armsoc-drivers' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc)
Merging usb-gadget-fixes/fixes (c6ba5084ce0d usb: gadget: udc: renesas_usb3: 
add binging for r8a77965)
Merging usb-serial-fixes/usb-linus (86d71233b615 USB: serial: ftdi_sio: add 
support for Harman FirmwareHubEmulator)
Merging usb-chipidea-fixes/ci-for-usb-stable (964728f9f407 USB: chipidea: msm: 
fix ulpi-node lookup)
Merging phy/fixes (59fba0869aca phy: qcom-ufs: add MODULE_LICENSE tag)
Merging staging.current/staging-linus (df34df483a97 Merge tag 
'staging-4.17-rc1' of

Re: [PATCHv5] gpio: Remove VLA from gpiolib

2018-04-15 Thread Phil Reid


On 16/04/2018 13:19, Phil Reid wrote:

G'day Laura,

One more comment.
On 16/04/2018 12:41, Phil Reid wrote:

G'day Laura,

On 14/04/2018 05:24, Laura Abbott wrote:

The new challenge is to remove VLAs from the kernel
(see https://lkml.org/lkml/2018/3/7/621) to eventually
turn on -Wvla.

Using a kmalloc array is the easy way to fix this but kmalloc is still
more expensive than stack allocation. Introduce a fast path with a
fixed size stack array to cover most chip with gpios below some fixed
amount. The slow path dynamically allocates an array to cover those
chips with a large number of gpios.

Reviewed-and-tested-by: Lukas Wunner 
Signed-off-by: Lukas Wunner 
Signed-off-by: Laura Abbott 
---
v5: Dropped some outdated comments and extra whitespace. Switched to
ARCH_NR_GPIOS per suggestion of Linus Walleij.
---
  drivers/gpio/gpiolib.c    | 76 +--
  drivers/gpio/gpiolib.h    |  2 +-
  include/linux/gpio/consumer.h | 10 +++---
  3 files changed, 66 insertions(+), 22 deletions(-)

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index d66de67ef307..79ec7a29b684 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -61,6 +61,11 @@ static struct bus_type gpio_bus_type = {
  .name = "gpio",
  };
+/*
+ * Number of GPIOs to use for the fast path in set array
+ */
+#define FASTPATH_NGPIO ARCH_NR_GPIOS


Also wouldn't this mean that fast path will never be triggered now...

Just to be clearer. That this will always be true. (chip->ngpio <= 
FASTPATH_NGPIO)




+
  /* gpio_lock prevents conflicts during gpio_desc[] table updates.
   * While any GPIO is requested, its gpio_chip is not removable;
   * each GPIO's "requested" flag serves as a lock and refcount.
@@ -399,12 +404,11 @@ static long linehandle_ioctl(struct file *filep, unsigned 
int cmd,
  vals[i] = !!ghd.values[i];
  /* Reuse the array setting function */
-    gpiod_set_array_value_complex(false,
+    return gpiod_set_array_value_complex(false,
    true,
    lh->numdescs,
    lh->descs,
    vals);
-    return 0;
  }
  return -EINVAL;
  }
@@ -1192,6 +1196,10 @@ int gpiochip_add_data_with_key(struct gpio_chip *chip, 
void *data,
  goto err_free_descs;
  }
+    if (chip->ngpio > FASTPATH_NGPIO)
+    chip_warn(chip, "line cnt %d is greater than fast path cnt %d\n",
+    chip->ngpio, FASTPATH_NGPIO);
+
  gdev->label = kstrdup_const(chip->label ?: "unknown", GFP_KERNEL);
  if (!gdev->label) {
  status = -ENOMEM;
@@ -2662,16 +2670,28 @@ int gpiod_get_array_value_complex(bool raw, bool 
can_sleep,
  while (i < array_size) {
  struct gpio_chip *chip = desc_array[i]->gdev->chip;
-    unsigned long mask[BITS_TO_LONGS(chip->ngpio)];
-    unsigned long bits[BITS_TO_LONGS(chip->ngpio)];
+    unsigned long fastpath[2 * BITS_TO_LONGS(FASTPATH_NGPIO)];
+    unsigned long *mask, *bits;
  int first, j, ret;
+    if (likely(chip->ngpio <= FASTPATH_NGPIO)) {
+    memset(fastpath, 0, sizeof(fastpath));
+    mask = fastpath;
+    bits = fastpath + BITS_TO_LONGS(FASTPATH_NGPIO);

Previously it looks like just mask was zeroed.
So could this just be:
   memset(mask, 0, BITS_TO_LONGS(chip->ngpio));

I'm guessing it's not a huge additional overhead as it is, but it's more in 
line with what was there.



+    } else {
+    mask = kcalloc(2 * BITS_TO_LONGS(chip->ngpio),
+   sizeof(*mask),
+   can_sleep ? GFP_KERNEL : GFP_ATOMIC);
+    if (!mask)
+    return -ENOMEM;
+    bits = mask + BITS_TO_LONGS(chip->ngpio);
+    }
+
  if (!can_sleep)
  WARN_ON(chip->can_sleep);
  /* collect all inputs belonging to the same chip */
  first = i;
-    memset(mask, 0, sizeof(mask));
  do {
  const struct gpio_desc *desc = desc_array[i];
  int hwgpio = gpio_chip_hwgpio(desc);
@@ -2682,8 +2702,11 @@ int gpiod_get_array_value_complex(bool raw, bool 
can_sleep,
   (desc_array[i]->gdev->chip == chip));
  ret = gpio_chip_get_multiple(chip, mask, bits);
-    if (ret)
+    if (ret) {
+    if (mask != fastpath)
+    kfree(mask);
  return ret;
+    }
  for (j = first; j < i; j++) {
  const struct gpio_desc *desc = desc_array[j];
@@ -2695,6 +2718,9 @@ int gpiod_get_array_value_complex(bool raw, bool 
can_sleep,
  value_array[j] = value;
  trace_gpio_value(desc_to_gpio(desc), 1, value);
  }
+
+    if (mask != fastpath)
+    kfree(mask);
  }
  return 0;
  }
@@ -2878,7 +2904,7 @@ static void gpio_chip_set_multiple(struct gpio_chip *chip,
  }
  }
-void

Re: [PATCHv5] gpio: Remove VLA from gpiolib

2018-04-15 Thread Phil Reid


On 16/04/2018 13:19, Phil Reid wrote:

G'day Laura,

One more comment.
On 16/04/2018 12:41, Phil Reid wrote:

G'day Laura,

On 14/04/2018 05:24, Laura Abbott wrote:

The new challenge is to remove VLAs from the kernel
(see https://lkml.org/lkml/2018/3/7/621) to eventually
turn on -Wvla.

Using a kmalloc array is the easy way to fix this but kmalloc is still
more expensive than stack allocation. Introduce a fast path with a
fixed size stack array to cover most chip with gpios below some fixed
amount. The slow path dynamically allocates an array to cover those
chips with a large number of gpios.

Reviewed-and-tested-by: Lukas Wunner 
Signed-off-by: Lukas Wunner 
Signed-off-by: Laura Abbott 
---
v5: Dropped some outdated comments and extra whitespace. Switched to
ARCH_NR_GPIOS per suggestion of Linus Walleij.
---
  drivers/gpio/gpiolib.c    | 76 +--
  drivers/gpio/gpiolib.h    |  2 +-
  include/linux/gpio/consumer.h | 10 +++---
  3 files changed, 66 insertions(+), 22 deletions(-)

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index d66de67ef307..79ec7a29b684 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -61,6 +61,11 @@ static struct bus_type gpio_bus_type = {
  .name = "gpio",
  };
+/*
+ * Number of GPIOs to use for the fast path in set array
+ */
+#define FASTPATH_NGPIO ARCH_NR_GPIOS


Also wouldn't this mean that fast path will never be triggered now...

Just to be clearer. That this will always be true. (chip->ngpio <= 
FASTPATH_NGPIO)




+
  /* gpio_lock prevents conflicts during gpio_desc[] table updates.
   * While any GPIO is requested, its gpio_chip is not removable;
   * each GPIO's "requested" flag serves as a lock and refcount.
@@ -399,12 +404,11 @@ static long linehandle_ioctl(struct file *filep, unsigned 
int cmd,
  vals[i] = !!ghd.values[i];
  /* Reuse the array setting function */
-    gpiod_set_array_value_complex(false,
+    return gpiod_set_array_value_complex(false,
    true,
    lh->numdescs,
    lh->descs,
    vals);
-    return 0;
  }
  return -EINVAL;
  }
@@ -1192,6 +1196,10 @@ int gpiochip_add_data_with_key(struct gpio_chip *chip, 
void *data,
  goto err_free_descs;
  }
+    if (chip->ngpio > FASTPATH_NGPIO)
+    chip_warn(chip, "line cnt %d is greater than fast path cnt %d\n",
+    chip->ngpio, FASTPATH_NGPIO);
+
  gdev->label = kstrdup_const(chip->label ?: "unknown", GFP_KERNEL);
  if (!gdev->label) {
  status = -ENOMEM;
@@ -2662,16 +2670,28 @@ int gpiod_get_array_value_complex(bool raw, bool 
can_sleep,
  while (i < array_size) {
  struct gpio_chip *chip = desc_array[i]->gdev->chip;
-    unsigned long mask[BITS_TO_LONGS(chip->ngpio)];
-    unsigned long bits[BITS_TO_LONGS(chip->ngpio)];
+    unsigned long fastpath[2 * BITS_TO_LONGS(FASTPATH_NGPIO)];
+    unsigned long *mask, *bits;
  int first, j, ret;
+    if (likely(chip->ngpio <= FASTPATH_NGPIO)) {
+    memset(fastpath, 0, sizeof(fastpath));
+    mask = fastpath;
+    bits = fastpath + BITS_TO_LONGS(FASTPATH_NGPIO);

Previously it looks like just mask was zeroed.
So could this just be:
   memset(mask, 0, BITS_TO_LONGS(chip->ngpio));

I'm guessing it's not a huge additional overhead as it is, but it's more in 
line with what was there.



+    } else {
+    mask = kcalloc(2 * BITS_TO_LONGS(chip->ngpio),
+   sizeof(*mask),
+   can_sleep ? GFP_KERNEL : GFP_ATOMIC);
+    if (!mask)
+    return -ENOMEM;
+    bits = mask + BITS_TO_LONGS(chip->ngpio);
+    }
+
  if (!can_sleep)
  WARN_ON(chip->can_sleep);
  /* collect all inputs belonging to the same chip */
  first = i;
-    memset(mask, 0, sizeof(mask));
  do {
  const struct gpio_desc *desc = desc_array[i];
  int hwgpio = gpio_chip_hwgpio(desc);
@@ -2682,8 +2702,11 @@ int gpiod_get_array_value_complex(bool raw, bool 
can_sleep,
   (desc_array[i]->gdev->chip == chip));
  ret = gpio_chip_get_multiple(chip, mask, bits);
-    if (ret)
+    if (ret) {
+    if (mask != fastpath)
+    kfree(mask);
  return ret;
+    }
  for (j = first; j < i; j++) {
  const struct gpio_desc *desc = desc_array[j];
@@ -2695,6 +2718,9 @@ int gpiod_get_array_value_complex(bool raw, bool 
can_sleep,
  value_array[j] = value;
  trace_gpio_value(desc_to_gpio(desc), 1, value);
  }
+
+    if (mask != fastpath)
+    kfree(mask);
  }
  return 0;
  }
@@ -2878,7 +2904,7 @@ static void gpio_chip_set_multiple(struct gpio_chip *chip,
  }
  }
-void gpiod_set_array_value_complex(bool raw, bool can_sleep,
+int

[PATCH] mailbox: arm_mhu: add support for mhuv2

2018-04-15 Thread Samarth Parikh

ARM has launched a next version of MHU i.e. MHUv2 with its latest
subsystems. The main change is that the MHUv2 is now a distributed IP
with different peripheral views (registers) for the sender and receiver.

Another main difference is that MHUv1 duplex channels are now split into
simplex/half duplex in MHUv2. MHUv2 has a configurable number of
communication channels. There is a capability register (MSG_NO_CAP) to
find out how many channels are available in a system.

The register offsets have also changed for STAT, SET & CLEAR registers
from 0x0, 0x8 & 0x10 in MHUv1 to 0x0, 0xC & 0x8 in MHUv2 respectively.

0x00x4  0x8  0xC 0x1F
-
| STAT ||| SET ||   |
-
  Transmit Channel

0x00x4  0x8   0xC0x1F
-
| STAT || CLR |||   |
-
Receive Channel

The MHU controller can request the receiver to wake-up and once the
request is removed, the receiver may go back to sleep, but the MHU
itself does not actively puts a receiver to sleep.

So, in order to wake-up the receiver when the sender wants to send data,
the sender has to set ACCESS_REQUEST register first in order to wake-up
receiver, state of which can be detected using ACCESS_READY register.
ACCESS_REQUEST has an offset of 0xF88 & ACCESS_READY has an offset
of 0xF8C and are accessible only on any sender channel.

This patch adds necessary changes required to support the older
version of MHU & the latest MHUv2 controller. This patch also need an
update in DT binding for ARM MHU as we need a second register base
(tx base) which would be used as the send channel base.

Signed-off-by: Samarth Parikh 
---
 drivers/mailbox/arm_mhu.c | 163 ++
 1 file changed, 151 insertions(+), 12 deletions(-)

diff --git a/drivers/mailbox/arm_mhu.c b/drivers/mailbox/arm_mhu.c
index 99befa7..d8825c5 100644
--- a/drivers/mailbox/arm_mhu.c
+++ b/drivers/mailbox/arm_mhu.c
@@ -23,6 +23,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 #define INTR_STAT_OFS  0x0
 #define INTR_SET_OFS   0x8
@@ -33,12 +35,69 @@
 #define MHU_SEC_OFFSET 0x200
 #define TX_REG_OFFSET  0x100

+#define MHU_V2_REG_STAT_OFS0x0
+#define MHU_V2_REG_CLR_OFS 0x8
+#define MHU_V2_REG_SET_OFS 0xC
+#define MHU_V2_REG_MSG_NO_CAP  0xF80
+#define MHU_V2_REG_ACC_REQ_OFS 0xF88
+#define MHU_V2_REG_ACC_RDY_OFS 0xF8C
+
+#define MHU_V2_LP_OFFSET  0x20
+#define MHU_V2_HP_OFFSET  0x0
+
 #define MHU_CHANS  3

+enum mhu_ver {
+   MHU_V1 = 1,
+   MHU_V2,
+   MHU_VER_END
+};
+
+enum mhu_regs {
+   MHU_REG_STAT,
+   MHU_REG_SET,
+   MHU_REG_CLR,
+   MHU_REG_END
+};
+
+enum mhu_access_regs {
+   MHU_REG_MSG_NO_CAP,
+   MHU_REG_ACC_REQ,
+   MHU_REG_ACC_RDY,
+   MHU_REG_ACC_END
+};
+
+enum mhu_channels {
+   MHU_CHAN_LOW,
+   MHU_CHAN_HIGH,
+   MHU_CHAN_SEC,
+   MHU_CHAN_END
+};
+
+/**
+ * ARM MHU Mailbox device specific data
+ *
+ * @regs: MHU version specific array of register offset for STAT,
+ *SET & CLEAR registers.
+ * @chans: MHU version specific array of channel offset for Low
+ * Priority, High Priority & Secure channels.
+ * @acc_regs: An array of access register offsets.
+ * @tx_reg_off: Offset for TX register.
+ * @version: Version of MHU controller available in the system.
+ */
+struct mhu_data {
+   int regs[MHU_REG_END]; /* STAT, SET, CLEAR */
+   int chans[MHU_CHAN_END]; /* LP, HP, Sec */
+   int acc_regs[MHU_REG_ACC_END];
+   long int tx_reg_off;
+   uint8_t version;
+};
+
 struct mhu_link {
unsigned irq;
void __iomem *tx_reg;
void __iomem *rx_reg;
+   unsigned int pchan;
 };

 struct arm_mhu {
@@ -46,21 +105,24 @@ struct arm_mhu {
struct mhu_link mlink[MHU_CHANS];
struct mbox_chan chan[MHU_CHANS];
struct mbox_controller mbox;
+   struct mhu_data *drvdata;
 };

 static irqreturn_t mhu_rx_interrupt(int irq, void *p)
 {
struct mbox_chan *chan = p;
struct mhu_link *mlink = chan->con_priv;
+   struct arm_mhu *mhu = container_of(chan->mbox, struct arm_mhu, mbox);
+   struct mhu_data *mdata = mhu->drvdata;
u32 val;

-   val = readl_relaxed(mlink->rx_reg + INTR_STAT_OFS);
+   val = readl_relaxed(mlink->rx_reg + mdata->regs[MHU_REG_STAT]);
if (!val)
return IRQ_NONE;

mbox_chan_received_data(chan, (void *));

-   writel_relaxed(val, mlink->rx_reg + INTR_CLR_OFS);
+   writel_relaxed(val, mlink->rx_reg + mdata->regs[MHU_REG_CLR]);

return IRQ_HANDLED;
 }
@@ -68,7 +130,9 @@ static irqreturn_t mhu_rx_interrupt(int irq, void *p)
 static bool mhu_last_tx_done(struct mbox_chan *chan)
 {
struct mhu_link *mlink = chan->con_priv;
-   u32 val = readl_relaxed(mlink->tx_reg + INTR_STAT_OFS);
+   struct arm_mhu

[PATCH] mailbox: arm_mhu: add support for mhuv2

2018-04-15 Thread Samarth Parikh

ARM has launched a next version of MHU i.e. MHUv2 with its latest
subsystems. The main change is that the MHUv2 is now a distributed IP
with different peripheral views (registers) for the sender and receiver.

Another main difference is that MHUv1 duplex channels are now split into
simplex/half duplex in MHUv2. MHUv2 has a configurable number of
communication channels. There is a capability register (MSG_NO_CAP) to
find out how many channels are available in a system.

The register offsets have also changed for STAT, SET & CLEAR registers
from 0x0, 0x8 & 0x10 in MHUv1 to 0x0, 0xC & 0x8 in MHUv2 respectively.

0x00x4  0x8  0xC 0x1F
-
| STAT ||| SET ||   |
-
  Transmit Channel

0x00x4  0x8   0xC0x1F
-
| STAT || CLR |||   |
-
Receive Channel

The MHU controller can request the receiver to wake-up and once the
request is removed, the receiver may go back to sleep, but the MHU
itself does not actively puts a receiver to sleep.

So, in order to wake-up the receiver when the sender wants to send data,
the sender has to set ACCESS_REQUEST register first in order to wake-up
receiver, state of which can be detected using ACCESS_READY register.
ACCESS_REQUEST has an offset of 0xF88 & ACCESS_READY has an offset
of 0xF8C and are accessible only on any sender channel.

This patch adds necessary changes required to support the older
version of MHU & the latest MHUv2 controller. This patch also need an
update in DT binding for ARM MHU as we need a second register base
(tx base) which would be used as the send channel base.

Signed-off-by: Samarth Parikh 
---
 drivers/mailbox/arm_mhu.c | 163 ++
 1 file changed, 151 insertions(+), 12 deletions(-)

diff --git a/drivers/mailbox/arm_mhu.c b/drivers/mailbox/arm_mhu.c
index 99befa7..d8825c5 100644
--- a/drivers/mailbox/arm_mhu.c
+++ b/drivers/mailbox/arm_mhu.c
@@ -23,6 +23,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 #define INTR_STAT_OFS  0x0
 #define INTR_SET_OFS   0x8
@@ -33,12 +35,69 @@
 #define MHU_SEC_OFFSET 0x200
 #define TX_REG_OFFSET  0x100

+#define MHU_V2_REG_STAT_OFS0x0
+#define MHU_V2_REG_CLR_OFS 0x8
+#define MHU_V2_REG_SET_OFS 0xC
+#define MHU_V2_REG_MSG_NO_CAP  0xF80
+#define MHU_V2_REG_ACC_REQ_OFS 0xF88
+#define MHU_V2_REG_ACC_RDY_OFS 0xF8C
+
+#define MHU_V2_LP_OFFSET  0x20
+#define MHU_V2_HP_OFFSET  0x0
+
 #define MHU_CHANS  3

+enum mhu_ver {
+   MHU_V1 = 1,
+   MHU_V2,
+   MHU_VER_END
+};
+
+enum mhu_regs {
+   MHU_REG_STAT,
+   MHU_REG_SET,
+   MHU_REG_CLR,
+   MHU_REG_END
+};
+
+enum mhu_access_regs {
+   MHU_REG_MSG_NO_CAP,
+   MHU_REG_ACC_REQ,
+   MHU_REG_ACC_RDY,
+   MHU_REG_ACC_END
+};
+
+enum mhu_channels {
+   MHU_CHAN_LOW,
+   MHU_CHAN_HIGH,
+   MHU_CHAN_SEC,
+   MHU_CHAN_END
+};
+
+/**
+ * ARM MHU Mailbox device specific data
+ *
+ * @regs: MHU version specific array of register offset for STAT,
+ *SET & CLEAR registers.
+ * @chans: MHU version specific array of channel offset for Low
+ * Priority, High Priority & Secure channels.
+ * @acc_regs: An array of access register offsets.
+ * @tx_reg_off: Offset for TX register.
+ * @version: Version of MHU controller available in the system.
+ */
+struct mhu_data {
+   int regs[MHU_REG_END]; /* STAT, SET, CLEAR */
+   int chans[MHU_CHAN_END]; /* LP, HP, Sec */
+   int acc_regs[MHU_REG_ACC_END];
+   long int tx_reg_off;
+   uint8_t version;
+};
+
 struct mhu_link {
unsigned irq;
void __iomem *tx_reg;
void __iomem *rx_reg;
+   unsigned int pchan;
 };

 struct arm_mhu {
@@ -46,21 +105,24 @@ struct arm_mhu {
struct mhu_link mlink[MHU_CHANS];
struct mbox_chan chan[MHU_CHANS];
struct mbox_controller mbox;
+   struct mhu_data *drvdata;
 };

 static irqreturn_t mhu_rx_interrupt(int irq, void *p)
 {
struct mbox_chan *chan = p;
struct mhu_link *mlink = chan->con_priv;
+   struct arm_mhu *mhu = container_of(chan->mbox, struct arm_mhu, mbox);
+   struct mhu_data *mdata = mhu->drvdata;
u32 val;

-   val = readl_relaxed(mlink->rx_reg + INTR_STAT_OFS);
+   val = readl_relaxed(mlink->rx_reg + mdata->regs[MHU_REG_STAT]);
if (!val)
return IRQ_NONE;

mbox_chan_received_data(chan, (void *));

-   writel_relaxed(val, mlink->rx_reg + INTR_CLR_OFS);
+   writel_relaxed(val, mlink->rx_reg + mdata->regs[MHU_REG_CLR]);

return IRQ_HANDLED;
 }
@@ -68,7 +130,9 @@ static irqreturn_t mhu_rx_interrupt(int irq, void *p)
 static bool mhu_last_tx_done(struct mbox_chan *chan)
 {
struct mhu_link *mlink = chan->con_priv;
-   u32 val = readl_relaxed(mlink->tx_reg + INTR_STAT_OFS);
+   struct arm_mhu *mhu =

Re: [PATCHv5] gpio: Remove VLA from gpiolib

2018-04-15 Thread Phil Reid


G'day Laura,

One more comment.
On 16/04/2018 12:41, Phil Reid wrote:

G'day Laura,

On 14/04/2018 05:24, Laura Abbott wrote:

The new challenge is to remove VLAs from the kernel
(see https://lkml.org/lkml/2018/3/7/621) to eventually
turn on -Wvla.

Using a kmalloc array is the easy way to fix this but kmalloc is still
more expensive than stack allocation. Introduce a fast path with a
fixed size stack array to cover most chip with gpios below some fixed
amount. The slow path dynamically allocates an array to cover those
chips with a large number of gpios.

Reviewed-and-tested-by: Lukas Wunner 
Signed-off-by: Lukas Wunner 
Signed-off-by: Laura Abbott 
---
v5: Dropped some outdated comments and extra whitespace. Switched to
ARCH_NR_GPIOS per suggestion of Linus Walleij.
---
  drivers/gpio/gpiolib.c    | 76 +--
  drivers/gpio/gpiolib.h    |  2 +-
  include/linux/gpio/consumer.h | 10 +++---
  3 files changed, 66 insertions(+), 22 deletions(-)

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index d66de67ef307..79ec7a29b684 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -61,6 +61,11 @@ static struct bus_type gpio_bus_type = {
  .name = "gpio",
  };
+/*
+ * Number of GPIOs to use for the fast path in set array
+ */
+#define FASTPATH_NGPIO ARCH_NR_GPIOS


Also wouldn't this mean that fast path will never be triggered now...



+
  /* gpio_lock prevents conflicts during gpio_desc[] table updates.
   * While any GPIO is requested, its gpio_chip is not removable;
   * each GPIO's "requested" flag serves as a lock and refcount.
@@ -399,12 +404,11 @@ static long linehandle_ioctl(struct file *filep, unsigned 
int cmd,
  vals[i] = !!ghd.values[i];
  /* Reuse the array setting function */
-    gpiod_set_array_value_complex(false,
+    return gpiod_set_array_value_complex(false,
    true,
    lh->numdescs,
    lh->descs,
    vals);
-    return 0;
  }
  return -EINVAL;
  }
@@ -1192,6 +1196,10 @@ int gpiochip_add_data_with_key(struct gpio_chip *chip, 
void *data,
  goto err_free_descs;
  }
+    if (chip->ngpio > FASTPATH_NGPIO)
+    chip_warn(chip, "line cnt %d is greater than fast path cnt %d\n",
+    chip->ngpio, FASTPATH_NGPIO);
+
  gdev->label = kstrdup_const(chip->label ?: "unknown", GFP_KERNEL);
  if (!gdev->label) {
  status = -ENOMEM;
@@ -2662,16 +2670,28 @@ int gpiod_get_array_value_complex(bool raw, bool 
can_sleep,
  while (i < array_size) {
  struct gpio_chip *chip = desc_array[i]->gdev->chip;
-    unsigned long mask[BITS_TO_LONGS(chip->ngpio)];
-    unsigned long bits[BITS_TO_LONGS(chip->ngpio)];
+    unsigned long fastpath[2 * BITS_TO_LONGS(FASTPATH_NGPIO)];
+    unsigned long *mask, *bits;
  int first, j, ret;
+    if (likely(chip->ngpio <= FASTPATH_NGPIO)) {
+    memset(fastpath, 0, sizeof(fastpath));
+    mask = fastpath;
+    bits = fastpath + BITS_TO_LONGS(FASTPATH_NGPIO);

Previously it looks like just mask was zeroed.
So could this just be:
   memset(mask, 0, BITS_TO_LONGS(chip->ngpio));

I'm guessing it's not a huge additional overhead as it is, but it's more in 
line with what was there.



+    } else {
+    mask = kcalloc(2 * BITS_TO_LONGS(chip->ngpio),
+   sizeof(*mask),
+   can_sleep ? GFP_KERNEL : GFP_ATOMIC);
+    if (!mask)
+    return -ENOMEM;
+    bits = mask + BITS_TO_LONGS(chip->ngpio);
+    }
+
  if (!can_sleep)
  WARN_ON(chip->can_sleep);
  /* collect all inputs belonging to the same chip */
  first = i;
-    memset(mask, 0, sizeof(mask));
  do {
  const struct gpio_desc *desc = desc_array[i];
  int hwgpio = gpio_chip_hwgpio(desc);
@@ -2682,8 +2702,11 @@ int gpiod_get_array_value_complex(bool raw, bool 
can_sleep,
   (desc_array[i]->gdev->chip == chip));
  ret = gpio_chip_get_multiple(chip, mask, bits);
-    if (ret)
+    if (ret) {
+    if (mask != fastpath)
+    kfree(mask);
  return ret;
+    }
  for (j = first; j < i; j++) {
  const struct gpio_desc *desc = desc_array[j];
@@ -2695,6 +2718,9 @@ int gpiod_get_array_value_complex(bool raw, bool 
can_sleep,
  value_array[j] = value;
  trace_gpio_value(desc_to_gpio(desc), 1, value);
  }
+
+    if (mask != fastpath)
+    kfree(mask);
  }
  return 0;
  }
@@ -2878,7 +2904,7 @@ static void gpio_chip_set_multiple(struct gpio_chip *chip,
  }
  }
-void gpiod_set_array_value_complex(bool raw, bool can_sleep,
+int gpiod_set_array_value_complex(bool raw, bool can_sleep,

Re: [PATCHv5] gpio: Remove VLA from gpiolib

2018-04-15 Thread Phil Reid


G'day Laura,

One more comment.
On 16/04/2018 12:41, Phil Reid wrote:

G'day Laura,

On 14/04/2018 05:24, Laura Abbott wrote:

The new challenge is to remove VLAs from the kernel
(see https://lkml.org/lkml/2018/3/7/621) to eventually
turn on -Wvla.

Using a kmalloc array is the easy way to fix this but kmalloc is still
more expensive than stack allocation. Introduce a fast path with a
fixed size stack array to cover most chip with gpios below some fixed
amount. The slow path dynamically allocates an array to cover those
chips with a large number of gpios.

Reviewed-and-tested-by: Lukas Wunner 
Signed-off-by: Lukas Wunner 
Signed-off-by: Laura Abbott 
---
v5: Dropped some outdated comments and extra whitespace. Switched to
ARCH_NR_GPIOS per suggestion of Linus Walleij.
---
  drivers/gpio/gpiolib.c    | 76 +--
  drivers/gpio/gpiolib.h    |  2 +-
  include/linux/gpio/consumer.h | 10 +++---
  3 files changed, 66 insertions(+), 22 deletions(-)

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index d66de67ef307..79ec7a29b684 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -61,6 +61,11 @@ static struct bus_type gpio_bus_type = {
  .name = "gpio",
  };
+/*
+ * Number of GPIOs to use for the fast path in set array
+ */
+#define FASTPATH_NGPIO ARCH_NR_GPIOS


Also wouldn't this mean that fast path will never be triggered now...



+
  /* gpio_lock prevents conflicts during gpio_desc[] table updates.
   * While any GPIO is requested, its gpio_chip is not removable;
   * each GPIO's "requested" flag serves as a lock and refcount.
@@ -399,12 +404,11 @@ static long linehandle_ioctl(struct file *filep, unsigned 
int cmd,
  vals[i] = !!ghd.values[i];
  /* Reuse the array setting function */
-    gpiod_set_array_value_complex(false,
+    return gpiod_set_array_value_complex(false,
    true,
    lh->numdescs,
    lh->descs,
    vals);
-    return 0;
  }
  return -EINVAL;
  }
@@ -1192,6 +1196,10 @@ int gpiochip_add_data_with_key(struct gpio_chip *chip, 
void *data,
  goto err_free_descs;
  }
+    if (chip->ngpio > FASTPATH_NGPIO)
+    chip_warn(chip, "line cnt %d is greater than fast path cnt %d\n",
+    chip->ngpio, FASTPATH_NGPIO);
+
  gdev->label = kstrdup_const(chip->label ?: "unknown", GFP_KERNEL);
  if (!gdev->label) {
  status = -ENOMEM;
@@ -2662,16 +2670,28 @@ int gpiod_get_array_value_complex(bool raw, bool 
can_sleep,
  while (i < array_size) {
  struct gpio_chip *chip = desc_array[i]->gdev->chip;
-    unsigned long mask[BITS_TO_LONGS(chip->ngpio)];
-    unsigned long bits[BITS_TO_LONGS(chip->ngpio)];
+    unsigned long fastpath[2 * BITS_TO_LONGS(FASTPATH_NGPIO)];
+    unsigned long *mask, *bits;
  int first, j, ret;
+    if (likely(chip->ngpio <= FASTPATH_NGPIO)) {
+    memset(fastpath, 0, sizeof(fastpath));
+    mask = fastpath;
+    bits = fastpath + BITS_TO_LONGS(FASTPATH_NGPIO);

Previously it looks like just mask was zeroed.
So could this just be:
   memset(mask, 0, BITS_TO_LONGS(chip->ngpio));

I'm guessing it's not a huge additional overhead as it is, but it's more in 
line with what was there.



+    } else {
+    mask = kcalloc(2 * BITS_TO_LONGS(chip->ngpio),
+   sizeof(*mask),
+   can_sleep ? GFP_KERNEL : GFP_ATOMIC);
+    if (!mask)
+    return -ENOMEM;
+    bits = mask + BITS_TO_LONGS(chip->ngpio);
+    }
+
  if (!can_sleep)
  WARN_ON(chip->can_sleep);
  /* collect all inputs belonging to the same chip */
  first = i;
-    memset(mask, 0, sizeof(mask));
  do {
  const struct gpio_desc *desc = desc_array[i];
  int hwgpio = gpio_chip_hwgpio(desc);
@@ -2682,8 +2702,11 @@ int gpiod_get_array_value_complex(bool raw, bool 
can_sleep,
   (desc_array[i]->gdev->chip == chip));
  ret = gpio_chip_get_multiple(chip, mask, bits);
-    if (ret)
+    if (ret) {
+    if (mask != fastpath)
+    kfree(mask);
  return ret;
+    }
  for (j = first; j < i; j++) {
  const struct gpio_desc *desc = desc_array[j];
@@ -2695,6 +2718,9 @@ int gpiod_get_array_value_complex(bool raw, bool 
can_sleep,
  value_array[j] = value;
  trace_gpio_value(desc_to_gpio(desc), 1, value);
  }
+
+    if (mask != fastpath)
+    kfree(mask);
  }
  return 0;
  }
@@ -2878,7 +2904,7 @@ static void gpio_chip_set_multiple(struct gpio_chip *chip,
  }
  }
-void gpiod_set_array_value_complex(bool raw, bool can_sleep,
+int gpiod_set_array_value_complex(bool raw, bool can_sleep,
 unsigned int array_size,

Re: [PATCH v4 08/15] KVM: s390: interfaces to (de)configure guest's AP matrix

2018-04-15 Thread kbuild test robot

Hi Tony,

I love your patch! Yet something to improve:

[auto build test ERROR on s390/features]
[also build test ERROR on v4.17-rc1 next-20180413]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Tony-Krowiak/s390-vfio-ap-guest-dedicated-crypto-adapters/20180416-052759
base:   https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
config: s390-alldefconfig (attached as .config)
compiler: s390x-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=s390 

All errors (new ones prefixed by >>):

   arch/s390/kvm/kvm-ap.o: In function `kvm_ap_matrix_create':
>> kvm-ap.c:(.text+0x176): undefined reference to `ap_query_configuration'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH v4 08/15] KVM: s390: interfaces to (de)configure guest's AP matrix

2018-04-15 Thread kbuild test robot

Hi Tony,

I love your patch! Yet something to improve:

[auto build test ERROR on s390/features]
[also build test ERROR on v4.17-rc1 next-20180413]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Tony-Krowiak/s390-vfio-ap-guest-dedicated-crypto-adapters/20180416-052759
base:   https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
config: s390-alldefconfig (attached as .config)
compiler: s390x-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=s390 

All errors (new ones prefixed by >>):

   arch/s390/kvm/kvm-ap.o: In function `kvm_ap_matrix_create':
>> kvm-ap.c:(.text+0x176): undefined reference to `ap_query_configuration'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

nds32 build failures

2018-04-15 Thread Guenter Roeck


I thought I should give the brand new architecture a try. Unfortunately, that 
was not very successful.

Build reference: v4.17-rc1
gcc version: nds32le-elf-gcc (GCC) 7.3.0

Building nds32:defconfig ... failed

arch/nds32/include/asm/nds32.h: In function 'GIE_ENABLE':
arch/nds32/include/asm/nds32.h:25:2: error: implicit declaration of function 
'__nds32__gie_en'; did you mean '__nds32__'?

arch/nds32/include/asm/nds32.h: In function 'CACHE_SET':
arch/nds32/include/asm/nds32.h:38:18: error: implicit declaration of function 
'__nds32__mfsr'; did you mean '__nds32__'?

arch/nds32/include/asm/nds32.h:38:32: error: 'NDS32_SR_ICM_CFG' undeclared

arch/nds32/include/asm/nds32.h:41:32: error: 'NDS32_SR_DCM_CFG'

Am I missing something ?

Guenter

nds32 build failures

2018-04-15 Thread Guenter Roeck


I thought I should give the brand new architecture a try. Unfortunately, that 
was not very successful.

Build reference: v4.17-rc1
gcc version: nds32le-elf-gcc (GCC) 7.3.0

Building nds32:defconfig ... failed

arch/nds32/include/asm/nds32.h: In function 'GIE_ENABLE':
arch/nds32/include/asm/nds32.h:25:2: error: implicit declaration of function 
'__nds32__gie_en'; did you mean '__nds32__'?

arch/nds32/include/asm/nds32.h: In function 'CACHE_SET':
arch/nds32/include/asm/nds32.h:38:18: error: implicit declaration of function 
'__nds32__mfsr'; did you mean '__nds32__'?

arch/nds32/include/asm/nds32.h:38:32: error: 'NDS32_SR_ICM_CFG' undeclared

arch/nds32/include/asm/nds32.h:41:32: error: 'NDS32_SR_DCM_CFG'

Am I missing something ?

Guenter

Re: [linux-sunxi] [PATCH v2 00/10] Allwinner H3 DVFS support

2018-04-15 Thread Chen-Yu Tsai

On Mon, Apr 16, 2018 at 12:41 PM, Chen-Yu Tsai  wrote:
> Hi,
>
> On Tue, Feb 6, 2018 at 12:48 PM, Icenowy Zheng  wrote:
>> This patchset tries to add DVFS support for Allwinner H3 SoC,
>> considering two kinds of adjustable regulators used on H3 boards:
>> SY8106A I2C-controlled regulator and SY8113B regulator (controllable
>> by GPIO with some special designs on the board), and also taking the
>> uncontrollable boards into consider.
>>
>> PATCH 1 and PATCH 2 are for the SY8106A regulator, then PATCH 3 and
>> PATCH 4 are for the r_i2c bus, which is used by boards with SY8106A
>> to communicate with the regulator.
>>
>> PATCH 5 adds the operating points v2 table to the H3 SoC, but with
>> OPPs higher than 1008MHz temporarily dropped.
>>
>> Then there's patches for several tested boards: Orange Pi PC (with
>> SY8106A), Orange Pi One/Zero (with GPIO-adjustable SY8113B) and
>> ALL-H3-CC (unadjustable).
>>
>> Icenowy Zheng (5):
>>   ARM: sun8i: h3: add operating-points-v2 table for CPU
>>   ARM: sun8i: h2+: add SY8113B regulator used by Orange Pi Zero board
>>   ARM: sun8i: h3: add SY8113B regulator used by Orange Pi One board
>>   ARM: sun8i: h3: fix ALL-H3-CC H3 ver VDD-CPUX voltage
>>   ARM: sun8i: h3: set the cpu-supply to VDD-CPUX on ALL-H3-CC H3 ver
>>
>> Ondrej Jirman (5):
>>   dt-bindings: add binding for the SY8106A voltage regulator
>>   regulator: add support for SY8106A regulator
>>   ARM: sunxi: h3/h5: Add r_i2c pinmux node
>>   ARM: sunxi: h3/h5: Add r_i2c I2C controller
>>   ARM: sun8i: h3: Add SY8106A regulator to Orange Pi PC
>
> I've applied all the device tree patches for 4.18, taking into account
> comments from Maxime. See
>
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux.git/log/?h=sunxi/h3-h5-for-4.17

I meant


https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux.git/log/?h=sunxi/h3-h5-for-4.18

of course...

> Mostly it's just renaming the regulator node names and labels.
>
> Please resend the first two patches to Mark Brown, the regulator
> subsystem maintainer. And you might want to mention the branch
> above in case he needs a use case reference.
>
> Regards
> ChenYu

Re: [linux-sunxi] [PATCH v2 00/10] Allwinner H3 DVFS support

2018-04-15 Thread Chen-Yu Tsai

On Mon, Apr 16, 2018 at 12:41 PM, Chen-Yu Tsai  wrote:
> Hi,
>
> On Tue, Feb 6, 2018 at 12:48 PM, Icenowy Zheng  wrote:
>> This patchset tries to add DVFS support for Allwinner H3 SoC,
>> considering two kinds of adjustable regulators used on H3 boards:
>> SY8106A I2C-controlled regulator and SY8113B regulator (controllable
>> by GPIO with some special designs on the board), and also taking the
>> uncontrollable boards into consider.
>>
>> PATCH 1 and PATCH 2 are for the SY8106A regulator, then PATCH 3 and
>> PATCH 4 are for the r_i2c bus, which is used by boards with SY8106A
>> to communicate with the regulator.
>>
>> PATCH 5 adds the operating points v2 table to the H3 SoC, but with
>> OPPs higher than 1008MHz temporarily dropped.
>>
>> Then there's patches for several tested boards: Orange Pi PC (with
>> SY8106A), Orange Pi One/Zero (with GPIO-adjustable SY8113B) and
>> ALL-H3-CC (unadjustable).
>>
>> Icenowy Zheng (5):
>>   ARM: sun8i: h3: add operating-points-v2 table for CPU
>>   ARM: sun8i: h2+: add SY8113B regulator used by Orange Pi Zero board
>>   ARM: sun8i: h3: add SY8113B regulator used by Orange Pi One board
>>   ARM: sun8i: h3: fix ALL-H3-CC H3 ver VDD-CPUX voltage
>>   ARM: sun8i: h3: set the cpu-supply to VDD-CPUX on ALL-H3-CC H3 ver
>>
>> Ondrej Jirman (5):
>>   dt-bindings: add binding for the SY8106A voltage regulator
>>   regulator: add support for SY8106A regulator
>>   ARM: sunxi: h3/h5: Add r_i2c pinmux node
>>   ARM: sunxi: h3/h5: Add r_i2c I2C controller
>>   ARM: sun8i: h3: Add SY8106A regulator to Orange Pi PC
>
> I've applied all the device tree patches for 4.18, taking into account
> comments from Maxime. See
>
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux.git/log/?h=sunxi/h3-h5-for-4.17

I meant


https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux.git/log/?h=sunxi/h3-h5-for-4.18

of course...

> Mostly it's just renaming the regulator node names and labels.
>
> Please resend the first two patches to Mark Brown, the regulator
> subsystem maintainer. And you might want to mention the branch
> above in case he needs a use case reference.
>
> Regards
> ChenYu

Re: [PATCHv5] gpio: Remove VLA from gpiolib

2018-04-15 Thread Phil Reid


G'day Laura,

On 14/04/2018 05:24, Laura Abbott wrote:

The new challenge is to remove VLAs from the kernel
(see https://lkml.org/lkml/2018/3/7/621) to eventually
turn on -Wvla.

Using a kmalloc array is the easy way to fix this but kmalloc is still
more expensive than stack allocation. Introduce a fast path with a
fixed size stack array to cover most chip with gpios below some fixed
amount. The slow path dynamically allocates an array to cover those
chips with a large number of gpios.

Reviewed-and-tested-by: Lukas Wunner 
Signed-off-by: Lukas Wunner 
Signed-off-by: Laura Abbott 
---
v5: Dropped some outdated comments and extra whitespace. Switched to
ARCH_NR_GPIOS per suggestion of Linus Walleij.
---
  drivers/gpio/gpiolib.c| 76 +--
  drivers/gpio/gpiolib.h|  2 +-
  include/linux/gpio/consumer.h | 10 +++---
  3 files changed, 66 insertions(+), 22 deletions(-)

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index d66de67ef307..79ec7a29b684 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -61,6 +61,11 @@ static struct bus_type gpio_bus_type = {
.name = "gpio",
  };
  
+/*

+ * Number of GPIOs to use for the fast path in set array
+ */
+#define FASTPATH_NGPIO ARCH_NR_GPIOS
+
  /* gpio_lock prevents conflicts during gpio_desc[] table updates.
   * While any GPIO is requested, its gpio_chip is not removable;
   * each GPIO's "requested" flag serves as a lock and refcount.
@@ -399,12 +404,11 @@ static long linehandle_ioctl(struct file *filep, unsigned 
int cmd,
vals[i] = !!ghd.values[i];
  
  		/* Reuse the array setting function */

-   gpiod_set_array_value_complex(false,
+   return gpiod_set_array_value_complex(false,
  true,
  lh->numdescs,
  lh->descs,
  vals);
-   return 0;
}
return -EINVAL;
  }
@@ -1192,6 +1196,10 @@ int gpiochip_add_data_with_key(struct gpio_chip *chip, 
void *data,
goto err_free_descs;
}
  
+	if (chip->ngpio > FASTPATH_NGPIO)

+   chip_warn(chip, "line cnt %d is greater than fast path cnt 
%d\n",
+   chip->ngpio, FASTPATH_NGPIO);
+
gdev->label = kstrdup_const(chip->label ?: "unknown", GFP_KERNEL);
if (!gdev->label) {
status = -ENOMEM;
@@ -2662,16 +2670,28 @@ int gpiod_get_array_value_complex(bool raw, bool 
can_sleep,
  
  	while (i < array_size) {

struct gpio_chip *chip = desc_array[i]->gdev->chip;
-   unsigned long mask[BITS_TO_LONGS(chip->ngpio)];
-   unsigned long bits[BITS_TO_LONGS(chip->ngpio)];
+   unsigned long fastpath[2 * BITS_TO_LONGS(FASTPATH_NGPIO)];
+   unsigned long *mask, *bits;
int first, j, ret;
  
+		if (likely(chip->ngpio <= FASTPATH_NGPIO)) {

+   memset(fastpath, 0, sizeof(fastpath));
+   mask = fastpath;
+   bits = fastpath + BITS_TO_LONGS(FASTPATH_NGPIO);

Previously it looks like just mask was zeroed.
So could this just be:
  memset(mask, 0, BITS_TO_LONGS(chip->ngpio));

I'm guessing it's not a huge additional overhead as it is, but it's more in 
line with what was there.



+   } else {
+   mask = kcalloc(2 * BITS_TO_LONGS(chip->ngpio),
+  sizeof(*mask),
+  can_sleep ? GFP_KERNEL : GFP_ATOMIC);
+   if (!mask)
+   return -ENOMEM;
+   bits = mask + BITS_TO_LONGS(chip->ngpio);
+   }
+
if (!can_sleep)
WARN_ON(chip->can_sleep);
  
  		/* collect all inputs belonging to the same chip */

first = i;
-   memset(mask, 0, sizeof(mask));
do {
const struct gpio_desc *desc = desc_array[i];
int hwgpio = gpio_chip_hwgpio(desc);
@@ -2682,8 +2702,11 @@ int gpiod_get_array_value_complex(bool raw, bool 
can_sleep,
 (desc_array[i]->gdev->chip == chip));
  
  		ret = gpio_chip_get_multiple(chip, mask, bits);

-   if (ret)
+   if (ret) {
+   if (mask != fastpath)
+   kfree(mask);
return ret;
+   }
  
  		for (j = first; j < i; j++) {

const struct gpio_desc *desc = desc_array[j];
@@ -2695,6 +2718,9 @@ int gpiod_get_array_value_complex(bool raw, bool 
can_sleep,
value_array[j] = value;
trace_gpio_value(desc_to_gpio(desc), 1, value);
}

Re: [PATCHv5] gpio: Remove VLA from gpiolib

2018-04-15 Thread Phil Reid


G'day Laura,

On 14/04/2018 05:24, Laura Abbott wrote:

The new challenge is to remove VLAs from the kernel
(see https://lkml.org/lkml/2018/3/7/621) to eventually
turn on -Wvla.

Using a kmalloc array is the easy way to fix this but kmalloc is still
more expensive than stack allocation. Introduce a fast path with a
fixed size stack array to cover most chip with gpios below some fixed
amount. The slow path dynamically allocates an array to cover those
chips with a large number of gpios.

Reviewed-and-tested-by: Lukas Wunner 
Signed-off-by: Lukas Wunner 
Signed-off-by: Laura Abbott 
---
v5: Dropped some outdated comments and extra whitespace. Switched to
ARCH_NR_GPIOS per suggestion of Linus Walleij.
---
  drivers/gpio/gpiolib.c| 76 +--
  drivers/gpio/gpiolib.h|  2 +-
  include/linux/gpio/consumer.h | 10 +++---
  3 files changed, 66 insertions(+), 22 deletions(-)

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index d66de67ef307..79ec7a29b684 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -61,6 +61,11 @@ static struct bus_type gpio_bus_type = {
.name = "gpio",
  };
  
+/*

+ * Number of GPIOs to use for the fast path in set array
+ */
+#define FASTPATH_NGPIO ARCH_NR_GPIOS
+
  /* gpio_lock prevents conflicts during gpio_desc[] table updates.
   * While any GPIO is requested, its gpio_chip is not removable;
   * each GPIO's "requested" flag serves as a lock and refcount.
@@ -399,12 +404,11 @@ static long linehandle_ioctl(struct file *filep, unsigned 
int cmd,
vals[i] = !!ghd.values[i];
  
  		/* Reuse the array setting function */

-   gpiod_set_array_value_complex(false,
+   return gpiod_set_array_value_complex(false,
  true,
  lh->numdescs,
  lh->descs,
  vals);
-   return 0;
}
return -EINVAL;
  }
@@ -1192,6 +1196,10 @@ int gpiochip_add_data_with_key(struct gpio_chip *chip, 
void *data,
goto err_free_descs;
}
  
+	if (chip->ngpio > FASTPATH_NGPIO)

+   chip_warn(chip, "line cnt %d is greater than fast path cnt 
%d\n",
+   chip->ngpio, FASTPATH_NGPIO);
+
gdev->label = kstrdup_const(chip->label ?: "unknown", GFP_KERNEL);
if (!gdev->label) {
status = -ENOMEM;
@@ -2662,16 +2670,28 @@ int gpiod_get_array_value_complex(bool raw, bool 
can_sleep,
  
  	while (i < array_size) {

struct gpio_chip *chip = desc_array[i]->gdev->chip;
-   unsigned long mask[BITS_TO_LONGS(chip->ngpio)];
-   unsigned long bits[BITS_TO_LONGS(chip->ngpio)];
+   unsigned long fastpath[2 * BITS_TO_LONGS(FASTPATH_NGPIO)];
+   unsigned long *mask, *bits;
int first, j, ret;
  
+		if (likely(chip->ngpio <= FASTPATH_NGPIO)) {

+   memset(fastpath, 0, sizeof(fastpath));
+   mask = fastpath;
+   bits = fastpath + BITS_TO_LONGS(FASTPATH_NGPIO);

Previously it looks like just mask was zeroed.
So could this just be:
  memset(mask, 0, BITS_TO_LONGS(chip->ngpio));

I'm guessing it's not a huge additional overhead as it is, but it's more in 
line with what was there.



+   } else {
+   mask = kcalloc(2 * BITS_TO_LONGS(chip->ngpio),
+  sizeof(*mask),
+  can_sleep ? GFP_KERNEL : GFP_ATOMIC);
+   if (!mask)
+   return -ENOMEM;
+   bits = mask + BITS_TO_LONGS(chip->ngpio);
+   }
+
if (!can_sleep)
WARN_ON(chip->can_sleep);
  
  		/* collect all inputs belonging to the same chip */

first = i;
-   memset(mask, 0, sizeof(mask));
do {
const struct gpio_desc *desc = desc_array[i];
int hwgpio = gpio_chip_hwgpio(desc);
@@ -2682,8 +2702,11 @@ int gpiod_get_array_value_complex(bool raw, bool 
can_sleep,
 (desc_array[i]->gdev->chip == chip));
  
  		ret = gpio_chip_get_multiple(chip, mask, bits);

-   if (ret)
+   if (ret) {
+   if (mask != fastpath)
+   kfree(mask);
return ret;
+   }
  
  		for (j = first; j < i; j++) {

const struct gpio_desc *desc = desc_array[j];
@@ -2695,6 +2718,9 @@ int gpiod_get_array_value_complex(bool raw, bool 
can_sleep,
value_array[j] = value;
trace_gpio_value(desc_to_gpio(desc), 1, value);
}
+
+   if (mask != fastpath)
+

Re: [linux-sunxi] [PATCH v2 00/10] Allwinner H3 DVFS support

2018-04-15 Thread Chen-Yu Tsai

Hi,

On Tue, Feb 6, 2018 at 12:48 PM, Icenowy Zheng  wrote:
> This patchset tries to add DVFS support for Allwinner H3 SoC,
> considering two kinds of adjustable regulators used on H3 boards:
> SY8106A I2C-controlled regulator and SY8113B regulator (controllable
> by GPIO with some special designs on the board), and also taking the
> uncontrollable boards into consider.
>
> PATCH 1 and PATCH 2 are for the SY8106A regulator, then PATCH 3 and
> PATCH 4 are for the r_i2c bus, which is used by boards with SY8106A
> to communicate with the regulator.
>
> PATCH 5 adds the operating points v2 table to the H3 SoC, but with
> OPPs higher than 1008MHz temporarily dropped.
>
> Then there's patches for several tested boards: Orange Pi PC (with
> SY8106A), Orange Pi One/Zero (with GPIO-adjustable SY8113B) and
> ALL-H3-CC (unadjustable).
>
> Icenowy Zheng (5):
>   ARM: sun8i: h3: add operating-points-v2 table for CPU
>   ARM: sun8i: h2+: add SY8113B regulator used by Orange Pi Zero board
>   ARM: sun8i: h3: add SY8113B regulator used by Orange Pi One board
>   ARM: sun8i: h3: fix ALL-H3-CC H3 ver VDD-CPUX voltage
>   ARM: sun8i: h3: set the cpu-supply to VDD-CPUX on ALL-H3-CC H3 ver
>
> Ondrej Jirman (5):
>   dt-bindings: add binding for the SY8106A voltage regulator
>   regulator: add support for SY8106A regulator
>   ARM: sunxi: h3/h5: Add r_i2c pinmux node
>   ARM: sunxi: h3/h5: Add r_i2c I2C controller
>   ARM: sun8i: h3: Add SY8106A regulator to Orange Pi PC

I've applied all the device tree patches for 4.18, taking into account
comments from Maxime. See


https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux.git/log/?h=sunxi/h3-h5-for-4.17

Mostly it's just renaming the regulator node names and labels.

Please resend the first two patches to Mark Brown, the regulator
subsystem maintainer. And you might want to mention the branch
above in case he needs a use case reference.

Regards
ChenYu

Re: [linux-sunxi] [PATCH v2 00/10] Allwinner H3 DVFS support

2018-04-15 Thread Chen-Yu Tsai

Hi,

On Tue, Feb 6, 2018 at 12:48 PM, Icenowy Zheng  wrote:
> This patchset tries to add DVFS support for Allwinner H3 SoC,
> considering two kinds of adjustable regulators used on H3 boards:
> SY8106A I2C-controlled regulator and SY8113B regulator (controllable
> by GPIO with some special designs on the board), and also taking the
> uncontrollable boards into consider.
>
> PATCH 1 and PATCH 2 are for the SY8106A regulator, then PATCH 3 and
> PATCH 4 are for the r_i2c bus, which is used by boards with SY8106A
> to communicate with the regulator.
>
> PATCH 5 adds the operating points v2 table to the H3 SoC, but with
> OPPs higher than 1008MHz temporarily dropped.
>
> Then there's patches for several tested boards: Orange Pi PC (with
> SY8106A), Orange Pi One/Zero (with GPIO-adjustable SY8113B) and
> ALL-H3-CC (unadjustable).
>
> Icenowy Zheng (5):
>   ARM: sun8i: h3: add operating-points-v2 table for CPU
>   ARM: sun8i: h2+: add SY8113B regulator used by Orange Pi Zero board
>   ARM: sun8i: h3: add SY8113B regulator used by Orange Pi One board
>   ARM: sun8i: h3: fix ALL-H3-CC H3 ver VDD-CPUX voltage
>   ARM: sun8i: h3: set the cpu-supply to VDD-CPUX on ALL-H3-CC H3 ver
>
> Ondrej Jirman (5):
>   dt-bindings: add binding for the SY8106A voltage regulator
>   regulator: add support for SY8106A regulator
>   ARM: sunxi: h3/h5: Add r_i2c pinmux node
>   ARM: sunxi: h3/h5: Add r_i2c I2C controller
>   ARM: sun8i: h3: Add SY8106A regulator to Orange Pi PC

I've applied all the device tree patches for 4.18, taking into account
comments from Maxime. See


https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux.git/log/?h=sunxi/h3-h5-for-4.17

Mostly it's just renaming the regulator node names and labels.

Please resend the first two patches to Mark Brown, the regulator
subsystem maintainer. And you might want to mention the branch
above in case he needs a use case reference.

Regards
ChenYu

Re: [PATCH v3 4/4] mm/sparse: Optimize memmap allocation during sparse_init()

2018-04-15 Thread Dave Hansen

On 04/14/2018 07:19 PM, Baoquan He wrote:
>>> Yes, this place is the hardest to understand. The temorary arrays are
>>> allocated beforehand with the size of 'nr_present_sections'. The error
>>> paths you mentioned is caused by allocation failure of mem_map or
>>> map_map, but whatever it's error or success paths, the sections must be
>>> marked as present in memory_present(). Error or success paths happened
>>> in alloc_usemap_and_memmap(), while checking if it's erorr or success
>>> paths happened in the last for_each_present_section_nr() of
>>> sparse_init(), and clear the ms->section_mem_map if it goes along error
>>> paths. This is the key point of this new allocation way.
>> I think you owe some commenting because this is so hard to understand.
> I can arrange and write a code comment above sparse_init() according to
> this patch's git log, do you think it's OK?
> 
> Honestly, it took me several days to write code, while I spent more
> than one week to write the patch log. Writing patch log is really a
> headache to me.

I often find the same: writing the code is the easy part.  Explaining
why it is right is the hard part.

Re: [PATCH v3 4/4] mm/sparse: Optimize memmap allocation during sparse_init()

2018-04-15 Thread Dave Hansen

On 04/14/2018 07:19 PM, Baoquan He wrote:
>>> Yes, this place is the hardest to understand. The temorary arrays are
>>> allocated beforehand with the size of 'nr_present_sections'. The error
>>> paths you mentioned is caused by allocation failure of mem_map or
>>> map_map, but whatever it's error or success paths, the sections must be
>>> marked as present in memory_present(). Error or success paths happened
>>> in alloc_usemap_and_memmap(), while checking if it's erorr or success
>>> paths happened in the last for_each_present_section_nr() of
>>> sparse_init(), and clear the ms->section_mem_map if it goes along error
>>> paths. This is the key point of this new allocation way.
>> I think you owe some commenting because this is so hard to understand.
> I can arrange and write a code comment above sparse_init() according to
> this patch's git log, do you think it's OK?
> 
> Honestly, it took me several days to write code, while I spent more
> than one week to write the patch log. Writing patch log is really a
> headache to me.

I often find the same: writing the code is the easy part.  Explaining
why it is right is the hard part.

Re: [PATCH v5 1/3] regulator: axp20x: add drivevbus support for axp803

2018-04-15 Thread Chen-Yu Tsai

On Thu, Apr 5, 2018 at 2:46 PM, Maxime Ripard  wrote:
> On Thu, Apr 05, 2018 at 12:11:39PM +0530, Jagan Teki wrote:
>> On Tue, Mar 27, 2018 at 11:01 AM, Jagan Teki  
>> wrote:
>> > Like axp221, axp223, axp813 the axp803 is also supporting external
>> > regulator to drive the  OTG VBus through N_VBUSEN PMIC pin.
>> >
>> > Add support for it.
>> >
>> > Signed-off-by: Jagan Teki 
>> > Reviewed-by: Rob Herring 
>> > Reviewed-by: Chen-Yu Tsai 
>> > ---
>> > Changes for v5:
>> > - Collect Chen-Yu reviewed-by tag
>> > Changes for v4:
>> > - rebase on master
>> > Changes for v3:
>> > - Update drivevbus in table of regulators
>>
>> Can you pick these, has some dependency with drivevbus on other
>> patches.
>
> I'm not the regulator maintainer, nor the AXP maintainer for that
> matter. Mark Brown and Chen-Yu are, respectively.

I've already reviewed all the patches. Please resend the series and
include Mark Brown, the regulator subsystem maintainer. He clearly
isn't in the current recipient list, so no wonder things didn't move
forward. Once he applies the driver bits, we'll apply any pending
device tree changes.

ChenYu

Re: [PATCH v5 1/3] regulator: axp20x: add drivevbus support for axp803

2018-04-15 Thread Chen-Yu Tsai

On Thu, Apr 5, 2018 at 2:46 PM, Maxime Ripard  wrote:
> On Thu, Apr 05, 2018 at 12:11:39PM +0530, Jagan Teki wrote:
>> On Tue, Mar 27, 2018 at 11:01 AM, Jagan Teki  
>> wrote:
>> > Like axp221, axp223, axp813 the axp803 is also supporting external
>> > regulator to drive the  OTG VBus through N_VBUSEN PMIC pin.
>> >
>> > Add support for it.
>> >
>> > Signed-off-by: Jagan Teki 
>> > Reviewed-by: Rob Herring 
>> > Reviewed-by: Chen-Yu Tsai 
>> > ---
>> > Changes for v5:
>> > - Collect Chen-Yu reviewed-by tag
>> > Changes for v4:
>> > - rebase on master
>> > Changes for v3:
>> > - Update drivevbus in table of regulators
>>
>> Can you pick these, has some dependency with drivevbus on other
>> patches.
>
> I'm not the regulator maintainer, nor the AXP maintainer for that
> matter. Mark Brown and Chen-Yu are, respectively.

I've already reviewed all the patches. Please resend the series and
include Mark Brown, the regulator subsystem maintainer. He clearly
isn't in the current recipient list, so no wonder things didn't move
forward. Once he applies the driver bits, we'll apply any pending
device tree changes.

ChenYu

Re: [PATCH] mtd: nand: mtk: use of_device_get_match_data()

2018-04-15 Thread xiaolei li

On Mon, 2018-04-16 at 10:33 +0800, Ryder Lee wrote:
> The usage of of_device_get_match_data() reduce the code size a bit.
> 
> Also, the only way to call .probe() is to match an entry in
> .of_match_table[], so of_device_id cannot be NULL.
> 
> Signed-off-by: Ryder Lee 
> ---
>  drivers/mtd/nand/raw/mtk_ecc.c  |  7 +--
>  drivers/mtd/nand/raw/mtk_nand.c | 10 +-
>  2 files changed, 2 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/mtd/nand/mtk_ecc.c b/drivers/mtd/nand/mtk_ecc.c
> index 40d86a8..6432bd7 100644
> --- a/drivers/mtd/nand/raw/mtk_ecc.c
> +++ b/drivers/mtd/nand/raw/mtk_ecc.c
> @@ -500,7 +500,6 @@ static int mtk_ecc_probe(struct platform_device *pdev)
>   struct device *dev = >dev;
>   struct mtk_ecc *ecc;
>   struct resource *res;
> - const struct of_device_id *of_ecc_id = NULL;
>   u32 max_eccdata_size;
>   int irq, ret;
>  
> @@ -508,11 +507,7 @@ static int mtk_ecc_probe(struct platform_device *pdev)
>   if (!ecc)
>   return -ENOMEM;
>  
> - of_ecc_id = of_match_device(mtk_ecc_dt_match, >dev);
> - if (!of_ecc_id)
> - return -ENODEV;
> -
> - ecc->caps = of_ecc_id->data;
> + ecc->caps = of_device_get_match_data(dev);
>  
Thanks.

Reviewed-by: Xiaolei Li 

>   max_eccdata_size = ecc->caps->num_ecc_strength - 1;
>   max_eccdata_size = ecc->caps->ecc_strength[max_eccdata_size];
> diff --git a/drivers/mtd/nand/mtk_nand.c b/drivers/mtd/nand/mtk_nand.c
> index 6977da3..75c845a 100644
> --- a/drivers/mtd/nand/raw/mtk_nand.c
> +++ b/drivers/mtd/nand/raw/mtk_nand.c
> @@ -1434,7 +1434,6 @@ static int mtk_nfc_probe(struct platform_device *pdev)
>   struct device_node *np = dev->of_node;
>   struct mtk_nfc *nfc;
>   struct resource *res;
> - const struct of_device_id *of_nfc_id = NULL;
>   int ret, irq;
>  
>   nfc = devm_kzalloc(dev, sizeof(*nfc), GFP_KERNEL);
> @@ -1452,6 +1451,7 @@ static int mtk_nfc_probe(struct platform_device *pdev)
>   else if (!nfc->ecc)
>   return -ENODEV;
>  
> + nfc->caps = of_device_get_match_data(dev);
>   nfc->dev = dev;
>  
>   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> @@ -1498,14 +1498,6 @@ static int mtk_nfc_probe(struct platform_device *pdev)
>   goto clk_disable;
>   }
>  
> - of_nfc_id = of_match_device(mtk_nfc_id_table, >dev);
> - if (!of_nfc_id) {
> - ret = -ENODEV;
> - goto clk_disable;
> - }
> -
> - nfc->caps = of_nfc_id->data;
> -
>   platform_set_drvdata(pdev, nfc);
>  
>   ret = mtk_nfc_nand_chips_init(dev, nfc);

Re: [PATCH] mtd: nand: mtk: use of_device_get_match_data()

2018-04-15 Thread xiaolei li

On Mon, 2018-04-16 at 10:33 +0800, Ryder Lee wrote:
> The usage of of_device_get_match_data() reduce the code size a bit.
> 
> Also, the only way to call .probe() is to match an entry in
> .of_match_table[], so of_device_id cannot be NULL.
> 
> Signed-off-by: Ryder Lee 
> ---
>  drivers/mtd/nand/raw/mtk_ecc.c  |  7 +--
>  drivers/mtd/nand/raw/mtk_nand.c | 10 +-
>  2 files changed, 2 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/mtd/nand/mtk_ecc.c b/drivers/mtd/nand/mtk_ecc.c
> index 40d86a8..6432bd7 100644
> --- a/drivers/mtd/nand/raw/mtk_ecc.c
> +++ b/drivers/mtd/nand/raw/mtk_ecc.c
> @@ -500,7 +500,6 @@ static int mtk_ecc_probe(struct platform_device *pdev)
>   struct device *dev = >dev;
>   struct mtk_ecc *ecc;
>   struct resource *res;
> - const struct of_device_id *of_ecc_id = NULL;
>   u32 max_eccdata_size;
>   int irq, ret;
>  
> @@ -508,11 +507,7 @@ static int mtk_ecc_probe(struct platform_device *pdev)
>   if (!ecc)
>   return -ENOMEM;
>  
> - of_ecc_id = of_match_device(mtk_ecc_dt_match, >dev);
> - if (!of_ecc_id)
> - return -ENODEV;
> -
> - ecc->caps = of_ecc_id->data;
> + ecc->caps = of_device_get_match_data(dev);
>  
Thanks.

Reviewed-by: Xiaolei Li 

>   max_eccdata_size = ecc->caps->num_ecc_strength - 1;
>   max_eccdata_size = ecc->caps->ecc_strength[max_eccdata_size];
> diff --git a/drivers/mtd/nand/mtk_nand.c b/drivers/mtd/nand/mtk_nand.c
> index 6977da3..75c845a 100644
> --- a/drivers/mtd/nand/raw/mtk_nand.c
> +++ b/drivers/mtd/nand/raw/mtk_nand.c
> @@ -1434,7 +1434,6 @@ static int mtk_nfc_probe(struct platform_device *pdev)
>   struct device_node *np = dev->of_node;
>   struct mtk_nfc *nfc;
>   struct resource *res;
> - const struct of_device_id *of_nfc_id = NULL;
>   int ret, irq;
>  
>   nfc = devm_kzalloc(dev, sizeof(*nfc), GFP_KERNEL);
> @@ -1452,6 +1451,7 @@ static int mtk_nfc_probe(struct platform_device *pdev)
>   else if (!nfc->ecc)
>   return -ENODEV;
>  
> + nfc->caps = of_device_get_match_data(dev);
>   nfc->dev = dev;
>  
>   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> @@ -1498,14 +1498,6 @@ static int mtk_nfc_probe(struct platform_device *pdev)
>   goto clk_disable;
>   }
>  
> - of_nfc_id = of_match_device(mtk_nfc_id_table, >dev);
> - if (!of_nfc_id) {
> - ret = -ENODEV;
> - goto clk_disable;
> - }
> -
> - nfc->caps = of_nfc_id->data;
> -
>   platform_set_drvdata(pdev, nfc);
>  
>   ret = mtk_nfc_nand_chips_init(dev, nfc);

Re: [PATCH] printk: Ratelimit messages printed by console drivers

2018-04-15 Thread Sergey Senozhatsky

On (04/16/18 10:47), Sergey Senozhatsky wrote:
> On (04/14/18 11:35), Sergey Senozhatsky wrote:
> > On (04/13/18 10:12), Steven Rostedt wrote:
> > > 
> > > > The interval is set to one hour. It is rather arbitrary selected time.
> > > > It is supposed to be a compromise between never print these messages,
> > > > do not lockup the machine, do not fill the entire buffer too quickly,
> > > > and get information if something changes over time.
> > > 
> > > 
> > > I think an hour is incredibly long. We only allow 100 lines per hour for
> > > printks happening inside another printk?
> > > 
> > > I think 5 minutes (at most) would probably be plenty. One minute may be
> > > good enough.
> > 
> > Besides 100 lines is absolutely not enough for any real lockdep splat.
> > My call would be - up to 1000 lines in a 1 minute interval.
> 
> Well, if we want to basically turn printk_safe() into 
> printk_safe_ratelimited().
> I'm not so sure about it.
> 
> Besides the patch also rate limits printk_nmi->logbuf - the logbuf
> PRINTK_NMI_DEFERRED_CONTEXT_MASK bypass, which is way too important
> to rate limit it - for no reason.
> 
> Dunno, can we keep printk_safe() the way it is and introduce a new
> printk_safe_ratelimited() specifically for call_console_drivers()?
> 
> Lockdep splat is a one time event, if we lose half of it - we, most
> like, lose the entire report. And call_console_drivers() is not the
> one and only source of warnings/errors/etc. So if we turn printk_safe
> into printk_safe_ratelimited() [not sure we want to do it] for all
> then I want restrictions to be as low as possible, IOW to log_store()
> as many lines as possible.

One more thing,
I'd really prefer to rate limit the function which flushes per-CPU
printk_safe buffers; not the function that appends new messages to
the per-CPU printk_safe buffers. There is a significant difference.

printk_safe does not help us when we are dealing with any external
locks - and call_console_drivers() is precisely that type of case.
The very next thing to happen after lockdep splat, or spin_lock
debugging report, etc. can be an actual deadlock->panic(). Thus I
want to have the entire report in per-CPU buffer [if possible],
so we can flush_on_panic() per-CPU buffers, or at least move the
data to the logbuf and make it accessible in vmcore. If we rate
limit the function that appends data to the per-CPU buffer then we
may simply suppress [rate limit] the report, so there will be
nothing to flush_on_panic().

-ss

Re: [PATCH v4 4/4] zram: introduce zram memory tracking

2018-04-15 Thread Minchan Kim

On Sun, Apr 15, 2018 at 09:17:45PM -0700, Randy Dunlap wrote:
> On 04/15/2018 08:31 PM, Minchan Kim wrote:
> > zRam as swap is useful for small memory device. However, swap means
> > those pages on zram are mostly cold pages due to VM's LRU algorithm.
> > Especially, once init data for application are touched for launching,
> > they tend to be not accessed any more and finally swapped out.
> > zRAM can store such cold pages as compressed form but it's pointless
> > to keep in memory. Better idea is app developers free them directly
> > rather than remaining them on heap.
> > 
> > This patch tell us last access time of each block of zram via
> > "cat /sys/kernel/debug/zram/zram0/block_state".
> > 
> > The output is as follows,
> >   30075.033841 .wh
> >   30163.806904 s..
> >   30263.806919 ..h
> > 
> > First column is zram's block index and 3rh one represents symbol
> > (s: same page w: written page to backing store h: huge page) of the
> > block state. Second column represents usec time unit of the block
> > was last accessed. So above example means the 300th block is accessed
> > at 75.033851 second and it was huge so it was written to the backing
> > store.
> > 
> > Admin can leverage this information to catch cold|incompressible pages
> > of process with *pagemap* once part of heaps are swapped out.
> > 
> > Acked-by: Greg Kroah-Hartman 
> > Signed-off-by: Minchan Kim 
> > ---
> >  Documentation/blockdev/zram.txt |  24 ++
> >  drivers/block/zram/Kconfig  |  10 +++
> >  drivers/block/zram/zram_drv.c   | 140 +---
> >  drivers/block/zram/zram_drv.h   |   5 ++
> >  4 files changed, 168 insertions(+), 11 deletions(-)
> > 
> > diff --git a/Documentation/blockdev/zram.txt 
> > b/Documentation/blockdev/zram.txt
> > index 78db38d02bc9..45509c7d5716 100644
> > --- a/Documentation/blockdev/zram.txt
> > +++ b/Documentation/blockdev/zram.txt
> > @@ -243,5 +243,29 @@ to backing storage rather than keeping it in memory.
> >  User should set up backing device via /sys/block/zramX/backing_dev
> >  before disksize setting.
> >  
> > += memory tracking
> > +
> > +With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the
> > +zram block. It could be useful to catch cold or incompressible
> > +pages of the proess with*pagemap.
> 
> ?   process
> 
> > +If you enable the feature, you could see block state via
> > +/sys/kernel/debug/zram/zram0/block_state". The output is as follows,
> > +
> > + 30075.033841 .wh
> > + 30163.806904 s..
> > + 30263.806919 ..h
> > +
> > +First column is zram's block index.
> > +Second column is access time.
> > +Third column is state of the block.
> > +(s: same page
> > +w: written page to backing store
> > +h: huge page)
> > +
> > +First line of above example says 300th block is accessed at 75.033841sec
> > +and the block's state is huge so it is written back to the backing
> > +storage. It's a debugging feature so anyone shouldn't rely on it to work
> > +properly.
> > +
> >  Nitin Gupta
> >  ngu...@vflare.org
> > diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
> > index ac3a31d433b2..01090338fb47 100644
> > --- a/drivers/block/zram/Kconfig
> > +++ b/drivers/block/zram/Kconfig
> > @@ -26,3 +26,13 @@ config ZRAM_WRITEBACK
> >  /sys/block/zramX/backing_dev.
> >  
> >  See zram.txt for more infomration.
> > +
> > +config ZRAM_MEMORY_TRACKING
> > +   bool "Tracking zram block status"
> 
>   bool "Track zram block status"
> 
> although sometimes it is zRam or zRAM.
> 
> 
> > +   depends on ZRAM && DEBUG_FS
> > +   help
> > + With this feature, admin can track the state of allocated block
> 
>   blocks
> 
> > + of zRAM. Admin could see the information via
> > + /sys/kernel/debug/zram/zramX/block_state.
> > +
> > + See zram.txt for more information.
> 
> See Documentation/blockdev/zram.txt for more information.

I just fix things. I will wait more feedback and then resend.
Thanks for the review!

Re: [PATCH] printk: Ratelimit messages printed by console drivers

2018-04-15 Thread Sergey Senozhatsky

On (04/16/18 10:47), Sergey Senozhatsky wrote:
> On (04/14/18 11:35), Sergey Senozhatsky wrote:
> > On (04/13/18 10:12), Steven Rostedt wrote:
> > > 
> > > > The interval is set to one hour. It is rather arbitrary selected time.
> > > > It is supposed to be a compromise between never print these messages,
> > > > do not lockup the machine, do not fill the entire buffer too quickly,
> > > > and get information if something changes over time.
> > > 
> > > 
> > > I think an hour is incredibly long. We only allow 100 lines per hour for
> > > printks happening inside another printk?
> > > 
> > > I think 5 minutes (at most) would probably be plenty. One minute may be
> > > good enough.
> > 
> > Besides 100 lines is absolutely not enough for any real lockdep splat.
> > My call would be - up to 1000 lines in a 1 minute interval.
> 
> Well, if we want to basically turn printk_safe() into 
> printk_safe_ratelimited().
> I'm not so sure about it.
> 
> Besides the patch also rate limits printk_nmi->logbuf - the logbuf
> PRINTK_NMI_DEFERRED_CONTEXT_MASK bypass, which is way too important
> to rate limit it - for no reason.
> 
> Dunno, can we keep printk_safe() the way it is and introduce a new
> printk_safe_ratelimited() specifically for call_console_drivers()?
> 
> Lockdep splat is a one time event, if we lose half of it - we, most
> like, lose the entire report. And call_console_drivers() is not the
> one and only source of warnings/errors/etc. So if we turn printk_safe
> into printk_safe_ratelimited() [not sure we want to do it] for all
> then I want restrictions to be as low as possible, IOW to log_store()
> as many lines as possible.

One more thing,
I'd really prefer to rate limit the function which flushes per-CPU
printk_safe buffers; not the function that appends new messages to
the per-CPU printk_safe buffers. There is a significant difference.

printk_safe does not help us when we are dealing with any external
locks - and call_console_drivers() is precisely that type of case.
The very next thing to happen after lockdep splat, or spin_lock
debugging report, etc. can be an actual deadlock->panic(). Thus I
want to have the entire report in per-CPU buffer [if possible],
so we can flush_on_panic() per-CPU buffers, or at least move the
data to the logbuf and make it accessible in vmcore. If we rate
limit the function that appends data to the per-CPU buffer then we
may simply suppress [rate limit] the report, so there will be
nothing to flush_on_panic().

-ss

Re: [PATCH v4 4/4] zram: introduce zram memory tracking

2018-04-15 Thread Minchan Kim

On Sun, Apr 15, 2018 at 09:17:45PM -0700, Randy Dunlap wrote:
> On 04/15/2018 08:31 PM, Minchan Kim wrote:
> > zRam as swap is useful for small memory device. However, swap means
> > those pages on zram are mostly cold pages due to VM's LRU algorithm.
> > Especially, once init data for application are touched for launching,
> > they tend to be not accessed any more and finally swapped out.
> > zRAM can store such cold pages as compressed form but it's pointless
> > to keep in memory. Better idea is app developers free them directly
> > rather than remaining them on heap.
> > 
> > This patch tell us last access time of each block of zram via
> > "cat /sys/kernel/debug/zram/zram0/block_state".
> > 
> > The output is as follows,
> >   30075.033841 .wh
> >   30163.806904 s..
> >   30263.806919 ..h
> > 
> > First column is zram's block index and 3rh one represents symbol
> > (s: same page w: written page to backing store h: huge page) of the
> > block state. Second column represents usec time unit of the block
> > was last accessed. So above example means the 300th block is accessed
> > at 75.033851 second and it was huge so it was written to the backing
> > store.
> > 
> > Admin can leverage this information to catch cold|incompressible pages
> > of process with *pagemap* once part of heaps are swapped out.
> > 
> > Acked-by: Greg Kroah-Hartman 
> > Signed-off-by: Minchan Kim 
> > ---
> >  Documentation/blockdev/zram.txt |  24 ++
> >  drivers/block/zram/Kconfig  |  10 +++
> >  drivers/block/zram/zram_drv.c   | 140 +---
> >  drivers/block/zram/zram_drv.h   |   5 ++
> >  4 files changed, 168 insertions(+), 11 deletions(-)
> > 
> > diff --git a/Documentation/blockdev/zram.txt 
> > b/Documentation/blockdev/zram.txt
> > index 78db38d02bc9..45509c7d5716 100644
> > --- a/Documentation/blockdev/zram.txt
> > +++ b/Documentation/blockdev/zram.txt
> > @@ -243,5 +243,29 @@ to backing storage rather than keeping it in memory.
> >  User should set up backing device via /sys/block/zramX/backing_dev
> >  before disksize setting.
> >  
> > += memory tracking
> > +
> > +With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the
> > +zram block. It could be useful to catch cold or incompressible
> > +pages of the proess with*pagemap.
> 
> ?   process
> 
> > +If you enable the feature, you could see block state via
> > +/sys/kernel/debug/zram/zram0/block_state". The output is as follows,
> > +
> > + 30075.033841 .wh
> > + 30163.806904 s..
> > + 30263.806919 ..h
> > +
> > +First column is zram's block index.
> > +Second column is access time.
> > +Third column is state of the block.
> > +(s: same page
> > +w: written page to backing store
> > +h: huge page)
> > +
> > +First line of above example says 300th block is accessed at 75.033841sec
> > +and the block's state is huge so it is written back to the backing
> > +storage. It's a debugging feature so anyone shouldn't rely on it to work
> > +properly.
> > +
> >  Nitin Gupta
> >  ngu...@vflare.org
> > diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
> > index ac3a31d433b2..01090338fb47 100644
> > --- a/drivers/block/zram/Kconfig
> > +++ b/drivers/block/zram/Kconfig
> > @@ -26,3 +26,13 @@ config ZRAM_WRITEBACK
> >  /sys/block/zramX/backing_dev.
> >  
> >  See zram.txt for more infomration.
> > +
> > +config ZRAM_MEMORY_TRACKING
> > +   bool "Tracking zram block status"
> 
>   bool "Track zram block status"
> 
> although sometimes it is zRam or zRAM.
> 
> 
> > +   depends on ZRAM && DEBUG_FS
> > +   help
> > + With this feature, admin can track the state of allocated block
> 
>   blocks
> 
> > + of zRAM. Admin could see the information via
> > + /sys/kernel/debug/zram/zramX/block_state.
> > +
> > + See zram.txt for more information.
> 
> See Documentation/blockdev/zram.txt for more information.

I just fix things. I will wait more feedback and then resend.
Thanks for the review!

[PATCH] perf tools: set kernel end address properly

2018-04-15 Thread Namhyung Kim

The map_groups__fixup_end() was called to set end addresses of kernel
map and module maps.  But now machine__create_modules() is set the end
address of modules properly so the only remaining piece is the kernel
map.  We can set it with adjacent module's address directly instead of
calling the map_groups__fixup_end().  If there's no module after the
kernel map, the end address will be ~0ULL.

Reported-by: Kim Phillips 
Signed-off-by: Namhyung Kim 
---
 tools/perf/util/machine.c | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 2eca8478e24f..be328416de61 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1019,13 +1019,6 @@ int machine__load_vmlinux_path(struct machine *machine, 
enum map_type type)
return ret;
 }
 
-static void map_groups__fixup_end(struct map_groups *mg)
-{
-   int i;
-   for (i = 0; i < MAP__NR_TYPES; ++i)
-   __map_groups__fixup_end(mg, i);
-}
-
 static char *get_kernel_version(const char *root_dir)
 {
char version[PATH_MAX];
@@ -1233,7 +1226,9 @@ int machine__create_kernel_maps(struct machine *machine)
 {
struct dso *kernel = machine__get_kernel(machine);
const char *name = NULL;
+   struct map *map;
u64 addr = 0;
+   u64 end = ~0ULL;
int ret;
 
if (kernel == NULL)
@@ -1259,13 +1254,14 @@ int machine__create_kernel_maps(struct machine *machine)
machine__destroy_kernel_maps(machine);
return -1;
}
-   machine__set_kernel_mmap(machine, addr, 0);
}
 
-   /*
-* Now that we have all the maps created, just set the ->end of them:
-*/
-   map_groups__fixup_end(>kmaps);
+   /* update end address of the kernel map using adjacent module address */
+   map = map__next(machine__kernel_map(machine));
+   if (map)
+   end = map->start;
+
+   machine__set_kernel_mmap(machine, addr, end);
return 0;
 }
 
-- 
2.16.2

[PATCH] perf tools: set kernel end address properly

2018-04-15 Thread Namhyung Kim

The map_groups__fixup_end() was called to set end addresses of kernel
map and module maps.  But now machine__create_modules() is set the end
address of modules properly so the only remaining piece is the kernel
map.  We can set it with adjacent module's address directly instead of
calling the map_groups__fixup_end().  If there's no module after the
kernel map, the end address will be ~0ULL.

Reported-by: Kim Phillips 
Signed-off-by: Namhyung Kim 
---
 tools/perf/util/machine.c | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 2eca8478e24f..be328416de61 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1019,13 +1019,6 @@ int machine__load_vmlinux_path(struct machine *machine, 
enum map_type type)
return ret;
 }
 
-static void map_groups__fixup_end(struct map_groups *mg)
-{
-   int i;
-   for (i = 0; i < MAP__NR_TYPES; ++i)
-   __map_groups__fixup_end(mg, i);
-}
-
 static char *get_kernel_version(const char *root_dir)
 {
char version[PATH_MAX];
@@ -1233,7 +1226,9 @@ int machine__create_kernel_maps(struct machine *machine)
 {
struct dso *kernel = machine__get_kernel(machine);
const char *name = NULL;
+   struct map *map;
u64 addr = 0;
+   u64 end = ~0ULL;
int ret;
 
if (kernel == NULL)
@@ -1259,13 +1254,14 @@ int machine__create_kernel_maps(struct machine *machine)
machine__destroy_kernel_maps(machine);
return -1;
}
-   machine__set_kernel_mmap(machine, addr, 0);
}
 
-   /*
-* Now that we have all the maps created, just set the ->end of them:
-*/
-   map_groups__fixup_end(>kmaps);
+   /* update end address of the kernel map using adjacent module address */
+   map = map__next(machine__kernel_map(machine));
+   if (map)
+   end = map->start;
+
+   machine__set_kernel_mmap(machine, addr, end);
return 0;
 }
 
-- 
2.16.2

[PATCH 05/25] staging: lustre: libcfs: remove excess space

2018-04-15 Thread James Simmons

From: Amir Shehata 

The function cfs_cpt_table_print() was adding two spaces
to the string buffer. Just add it once.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c 
b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
index d207ae5..b2a88ef 100644
--- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
@@ -147,7 +147,7 @@ struct cfs_cpt_table *
 
for (i = 0; i < cptab->ctb_nparts; i++) {
if (len > 0) {
-   rc = snprintf(tmp, len, "%d\t: ", i);
+   rc = snprintf(tmp, len, "%d\t:", i);
len -= rc;
}
 
-- 
1.8.3.1

[PATCH 02/22] staging: lustre: obd: create it_has_reply_body()

2018-04-15 Thread James Simmons

From: Vitaly Fertman 

The lookup_intent it_op fields in many cases will be compared
to the settings of IT_OPEN | IT_UNLINK | IT_LOOKUP | IT_GETATTR.
Create a simple inline function for this common case.

Signed-off-by: Vitaly Fertman 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7433
Seagate-bug-id: MRP-3072 MRP-3137
Reviewed-on: http://review.whamcloud.com/17220
Reviewed-by: Andrew Perepechko 
Reviewed-by: Andriy Skulysh 
Tested-by: Elena V. Gryaznova 
Reviewed-by: John L. Hammond 
Reviewed-by: Lai Siyao 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/include/obd.h   | 10 ++
 drivers/staging/lustre/lustre/mdc/mdc_locks.c |  2 +-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/include/obd.h 
b/drivers/staging/lustre/lustre/include/obd.h
index f1233ca..ea6056b 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -686,6 +686,16 @@ enum md_cli_flags {
CLI_MIGRATE = BIT(4),
 };
 
+/**
+ * GETXATTR is not included as only a couple of fields in the reply body
+ * is filled, but not FID which is needed for common intent handling in
+ * mdc_finish_intent_lock()
+ */
+static inline bool it_has_reply_body(const struct lookup_intent *it)
+{
+   return it->it_op & (IT_OPEN | IT_UNLINK | IT_LOOKUP | IT_GETATTR);
+}
+
 struct md_op_data {
struct lu_fid  op_fid1; /* operation fid1 (usually parent) */
struct lu_fid  op_fid2; /* operation fid2 (usually child) */
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c 
b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 695ef44..309ead1 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -568,7 +568,7 @@ static int mdc_finish_enqueue(struct obd_export *exp,
  it->it_op, it->it_disposition, it->it_status);
 
/* We know what to expect, so we do any byte flipping required here */
-   if (it->it_op & (IT_OPEN | IT_UNLINK | IT_LOOKUP | IT_GETATTR)) {
+   if (it_has_reply_body(it)) {
struct mdt_body *body;
 
body = req_capsule_server_get(pill, _MDT_BODY);
-- 
1.8.3.1

[PATCH 05/25] staging: lustre: libcfs: remove excess space

2018-04-15 Thread James Simmons

From: Amir Shehata 

The function cfs_cpt_table_print() was adding two spaces
to the string buffer. Just add it once.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c 
b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
index d207ae5..b2a88ef 100644
--- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
@@ -147,7 +147,7 @@ struct cfs_cpt_table *
 
for (i = 0; i < cptab->ctb_nparts; i++) {
if (len > 0) {
-   rc = snprintf(tmp, len, "%d\t: ", i);
+   rc = snprintf(tmp, len, "%d\t:", i);
len -= rc;
}
 
-- 
1.8.3.1

[PATCH 02/22] staging: lustre: obd: create it_has_reply_body()

2018-04-15 Thread James Simmons

From: Vitaly Fertman 

The lookup_intent it_op fields in many cases will be compared
to the settings of IT_OPEN | IT_UNLINK | IT_LOOKUP | IT_GETATTR.
Create a simple inline function for this common case.

Signed-off-by: Vitaly Fertman 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7433
Seagate-bug-id: MRP-3072 MRP-3137
Reviewed-on: http://review.whamcloud.com/17220
Reviewed-by: Andrew Perepechko 
Reviewed-by: Andriy Skulysh 
Tested-by: Elena V. Gryaznova 
Reviewed-by: John L. Hammond 
Reviewed-by: Lai Siyao 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/include/obd.h   | 10 ++
 drivers/staging/lustre/lustre/mdc/mdc_locks.c |  2 +-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/include/obd.h 
b/drivers/staging/lustre/lustre/include/obd.h
index f1233ca..ea6056b 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -686,6 +686,16 @@ enum md_cli_flags {
CLI_MIGRATE = BIT(4),
 };
 
+/**
+ * GETXATTR is not included as only a couple of fields in the reply body
+ * is filled, but not FID which is needed for common intent handling in
+ * mdc_finish_intent_lock()
+ */
+static inline bool it_has_reply_body(const struct lookup_intent *it)
+{
+   return it->it_op & (IT_OPEN | IT_UNLINK | IT_LOOKUP | IT_GETATTR);
+}
+
 struct md_op_data {
struct lu_fid  op_fid1; /* operation fid1 (usually parent) */
struct lu_fid  op_fid2; /* operation fid2 (usually child) */
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c 
b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 695ef44..309ead1 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -568,7 +568,7 @@ static int mdc_finish_enqueue(struct obd_export *exp,
  it->it_op, it->it_disposition, it->it_status);
 
/* We know what to expect, so we do any byte flipping required here */
-   if (it->it_op & (IT_OPEN | IT_UNLINK | IT_LOOKUP | IT_GETATTR)) {
+   if (it_has_reply_body(it)) {
struct mdt_body *body;
 
body = req_capsule_server_get(pill, _MDT_BODY);
-- 
1.8.3.1

[PATCH 04/22] staging: lustre: ldlm: xattr locks are lost on mdt

2018-04-15 Thread James Simmons

From: Vitaly Fertman 

On the server side mdt_intent_getxattr() can return EFAULT if a
buffer cannot be found, it is returned after lock_replace, where a
new lock is installed into lockp. An error forces ldlm_lock_enqueue()
to destroy the original lock, but ldlm_handle_enqueue0() drops the
reference on the new lock. The xattr client code implied intent
error is returned under a lock, which is immediately cancelled.
Check if a lock obtained and cancel it properly for error cases.
Note: we should support both cases for interop needs, an intent
error under a lock and with a lock abort. Keep returning a lock
with an intent error for interop purposes for now, to be dropped
later when client will get old enough. make all intent ops to
work through md_intent_lock: getxattr and layout, which should
extract the intent error.

Signed-off-by: Vitaly Fertman 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7433
Seagate-bug-id: MRP-3072 MRP-3137
Reviewed-on: http://review.whamcloud.com/17220
Reviewed-by: Andrew Perepechko 
Reviewed-by: Andriy Skulysh 
Tested-by: Elena V. Gryaznova 
Reviewed-by: John L. Hammond 
Reviewed-by: Lai Siyao 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/include/obd.h   |  3 +-
 drivers/staging/lustre/lustre/include/obd_class.h |  3 +-
 drivers/staging/lustre/lustre/llite/file.c| 16 ++---
 drivers/staging/lustre/lustre/llite/xattr_cache.c | 75 ---
 drivers/staging/lustre/lustre/lmv/lmv_intent.c| 12 ++--
 drivers/staging/lustre/lustre/lmv/lmv_obd.c   |  7 +--
 drivers/staging/lustre/lustre/mdc/mdc_internal.h  |  4 +-
 drivers/staging/lustre/lustre/mdc/mdc_locks.c | 66 ++--
 8 files changed, 95 insertions(+), 91 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/obd.h 
b/drivers/staging/lustre/lustre/include/obd.h
index ea6056b..48cf7ab 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -909,8 +909,7 @@ struct md_ops {
  const void *, size_t, umode_t, uid_t, gid_t,
  cfs_cap_t, __u64, struct ptlrpc_request **);
int (*enqueue)(struct obd_export *, struct ldlm_enqueue_info *,
-  const union ldlm_policy_data *,
-  struct lookup_intent *, struct md_op_data *,
+  const union ldlm_policy_data *, struct md_op_data *,
   struct lustre_handle *, __u64);
int (*getattr)(struct obd_export *, struct md_op_data *,
   struct ptlrpc_request **);
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h 
b/drivers/staging/lustre/lustre/include/obd_class.h
index 176b63e..a76f016 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -1241,7 +1241,6 @@ static inline int md_create(struct obd_export *exp, 
struct md_op_data *op_data,
 static inline int md_enqueue(struct obd_export *exp,
 struct ldlm_enqueue_info *einfo,
 const union ldlm_policy_data *policy,
-struct lookup_intent *it,
 struct md_op_data *op_data,
 struct lustre_handle *lockh,
 __u64 extra_lock_flags)
@@ -1250,7 +1249,7 @@ static inline int md_enqueue(struct obd_export *exp,
 
EXP_CHECK_MD_OP(exp, enqueue);
EXP_MD_COUNTER_INCREMENT(exp, enqueue);
-   rc = MDP(exp->exp_obd, enqueue)(exp, einfo, policy, it, op_data, lockh,
+   rc = MDP(exp->exp_obd, enqueue)(exp, einfo, policy, op_data, lockh,
extra_lock_flags);
return rc;
 }
diff --git a/drivers/staging/lustre/lustre/llite/file.c 
b/drivers/staging/lustre/lustre/llite/file.c
index ca5faea..0026fde 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -2514,7 +2514,7 @@ int ll_fsync(struct file *file, loff_t start, loff_t end, 
int datasync)
   PFID(ll_inode2fid(inode)), flock.l_flock.pid, flags,
   einfo.ei_mode, flock.l_flock.start, flock.l_flock.end);
 
-   rc = md_enqueue(sbi->ll_md_exp, , , NULL, op_data, ,
+   rc = md_enqueue(sbi->ll_md_exp, , , op_data, ,
flags);
 
/* Restore the file lock type if not TEST lock. */
@@ -2527,7 +2527,7 @@ int ll_fsync(struct file *file, loff_t start, loff_t end, 
int datasync)
 
if (rc2 && file_lock->fl_type != F_UNLCK) {
einfo.ei_mode = LCK_NL;
-   md_enqueue(sbi->ll_md_exp, , , NULL, op_data,
+   md_enqueue(sbi->ll_md_exp, , ,

[PATCH 01/22] staging: lustre: llite: initialize xattr->xe_namelen

2018-04-15 Thread James Simmons

When the allocation of xattr->xe_name was moved to kstrdup()
setting xattr->xe_namelen was dropped. This field is used
in several parts of the xattr cache code so it broke xattr
handling. Initialize xattr->xe_namelen when allocating
xattr->xe_name succeeds. Also change the debugging statement
to really report the xattr name instead of its length which
wasn't event being set.

Fixes: b3dd8957c23a ("staging: lustre: lustre: llite: Use kstrdup"
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr_cache.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr_cache.c 
b/drivers/staging/lustre/lustre/llite/xattr_cache.c
index 4dc799d..ef66949 100644
--- a/drivers/staging/lustre/lustre/llite/xattr_cache.c
+++ b/drivers/staging/lustre/lustre/llite/xattr_cache.c
@@ -121,10 +121,12 @@ static int ll_xattr_cache_add(struct list_head *cache,
 
xattr->xe_name = kstrdup(xattr_name, GFP_NOFS);
if (!xattr->xe_name) {
-   CDEBUG(D_CACHE, "failed to alloc xattr name %u\n",
-  xattr->xe_namelen);
+   CDEBUG(D_CACHE, "failed to alloc xattr name %s\n",
+  xattr_name);
goto err_name;
}
+   xattr->xe_namelen = strlen(xattr_name) + 1;
+
xattr->xe_value = kmemdup(xattr_val, xattr_val_len, GFP_NOFS);
if (!xattr->xe_value)
goto err_value;
-- 
1.8.3.1

[PATCH 00/22] staging: lustre: llite: fix xattr handling

2018-04-15 Thread James Simmons

From: James Simmons 

Lustre utilities and user land APIs heavly depend on special xattr
handling. Sadly much of the xattr handling for lustre client has
been broken for awhile. This is all the fixes needed to make xattr
handling work properly with the latest kernels.

Bobi Jam (3):
  staging: lustre: llite: break up ll_setstripe_ea function
  staging: lustre: llite: return from ll_adjust_lum() if lump is NULL
  staging: lustre: llite: eat -EEXIST on setting trusted.lov

Dmitry Eremin (1):
  staging: lustre: llite: add support set_acl method in inode operations

James Simmons (9):
  staging: lustre: llite: initialize xattr->xe_namelen
  staging: lustre: llite: fix invalid size test in ll_setstripe_ea()
  staging: lustre: llite: remove newline in fullname strings
  staging: lustre: llite: record in stats attempted removal of lma/link xattr
  staging: lustre: llite: cleanup posix acl xattr code
  staging: lustre: llite: use proper types in the xattr code
  staging: lustre: llite: cleanup xattr code comments
  staging: lustre: llite: style changes in xattr.c
  staging: lustre: llite: correct removexattr detection

John L. Hammond (3):
  staging: lustre: llite: handle xattr cache refill race
  staging: lustre: llite: use xattr_handler name for ACLs
  staging: lustre: llite: remove unused parameters from md_{get,set}xattr()

Niu Yawei (2):
  staging: lustre: llite: refactor lustre.lov xattr handling
  staging: lustre: llite: add simple comment about lustre.lov xattrs

Robin Humble (1):
  staging: lustre: llite: Remove filtering of seclabel xattr

Vitaly Fertman (3):
  staging: lustre: obd: create it_has_reply_body()
  staging: lustre: obd: change debug reporting in lmv_enqueue()
  staging: lustre: ldlm: xattr locks are lost on mdt

 drivers/staging/lustre/lustre/include/obd.h|  20 +-
 drivers/staging/lustre/lustre/include/obd_class.h  |  24 +--
 drivers/staging/lustre/lustre/llite/file.c |  86 ++--
 .../staging/lustre/lustre/llite/llite_internal.h   |   4 +
 drivers/staging/lustre/lustre/llite/namei.c|  10 +-
 drivers/staging/lustre/lustre/llite/xattr.c| 231 -
 drivers/staging/lustre/lustre/llite/xattr_cache.c  |  83 +++-
 drivers/staging/lustre/lustre/lmv/lmv_intent.c |  12 +-
 drivers/staging/lustre/lustre/lmv/lmv_obd.c|  36 ++--
 drivers/staging/lustre/lustre/mdc/mdc_internal.h   |   4 +-
 drivers/staging/lustre/lustre/mdc/mdc_locks.c  |  68 --
 drivers/staging/lustre/lustre/mdc/mdc_request.c|  34 +--
 12 files changed, 364 insertions(+), 248 deletions(-)

-- 
1.8.3.1

[PATCH 04/22] staging: lustre: ldlm: xattr locks are lost on mdt

2018-04-15 Thread James Simmons

From: Vitaly Fertman 

On the server side mdt_intent_getxattr() can return EFAULT if a
buffer cannot be found, it is returned after lock_replace, where a
new lock is installed into lockp. An error forces ldlm_lock_enqueue()
to destroy the original lock, but ldlm_handle_enqueue0() drops the
reference on the new lock. The xattr client code implied intent
error is returned under a lock, which is immediately cancelled.
Check if a lock obtained and cancel it properly for error cases.
Note: we should support both cases for interop needs, an intent
error under a lock and with a lock abort. Keep returning a lock
with an intent error for interop purposes for now, to be dropped
later when client will get old enough. make all intent ops to
work through md_intent_lock: getxattr and layout, which should
extract the intent error.

Signed-off-by: Vitaly Fertman 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7433
Seagate-bug-id: MRP-3072 MRP-3137
Reviewed-on: http://review.whamcloud.com/17220
Reviewed-by: Andrew Perepechko 
Reviewed-by: Andriy Skulysh 
Tested-by: Elena V. Gryaznova 
Reviewed-by: John L. Hammond 
Reviewed-by: Lai Siyao 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/include/obd.h   |  3 +-
 drivers/staging/lustre/lustre/include/obd_class.h |  3 +-
 drivers/staging/lustre/lustre/llite/file.c| 16 ++---
 drivers/staging/lustre/lustre/llite/xattr_cache.c | 75 ---
 drivers/staging/lustre/lustre/lmv/lmv_intent.c| 12 ++--
 drivers/staging/lustre/lustre/lmv/lmv_obd.c   |  7 +--
 drivers/staging/lustre/lustre/mdc/mdc_internal.h  |  4 +-
 drivers/staging/lustre/lustre/mdc/mdc_locks.c | 66 ++--
 8 files changed, 95 insertions(+), 91 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/obd.h 
b/drivers/staging/lustre/lustre/include/obd.h
index ea6056b..48cf7ab 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -909,8 +909,7 @@ struct md_ops {
  const void *, size_t, umode_t, uid_t, gid_t,
  cfs_cap_t, __u64, struct ptlrpc_request **);
int (*enqueue)(struct obd_export *, struct ldlm_enqueue_info *,
-  const union ldlm_policy_data *,
-  struct lookup_intent *, struct md_op_data *,
+  const union ldlm_policy_data *, struct md_op_data *,
   struct lustre_handle *, __u64);
int (*getattr)(struct obd_export *, struct md_op_data *,
   struct ptlrpc_request **);
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h 
b/drivers/staging/lustre/lustre/include/obd_class.h
index 176b63e..a76f016 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -1241,7 +1241,6 @@ static inline int md_create(struct obd_export *exp, 
struct md_op_data *op_data,
 static inline int md_enqueue(struct obd_export *exp,
 struct ldlm_enqueue_info *einfo,
 const union ldlm_policy_data *policy,
-struct lookup_intent *it,
 struct md_op_data *op_data,
 struct lustre_handle *lockh,
 __u64 extra_lock_flags)
@@ -1250,7 +1249,7 @@ static inline int md_enqueue(struct obd_export *exp,
 
EXP_CHECK_MD_OP(exp, enqueue);
EXP_MD_COUNTER_INCREMENT(exp, enqueue);
-   rc = MDP(exp->exp_obd, enqueue)(exp, einfo, policy, it, op_data, lockh,
+   rc = MDP(exp->exp_obd, enqueue)(exp, einfo, policy, op_data, lockh,
extra_lock_flags);
return rc;
 }
diff --git a/drivers/staging/lustre/lustre/llite/file.c 
b/drivers/staging/lustre/lustre/llite/file.c
index ca5faea..0026fde 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -2514,7 +2514,7 @@ int ll_fsync(struct file *file, loff_t start, loff_t end, 
int datasync)
   PFID(ll_inode2fid(inode)), flock.l_flock.pid, flags,
   einfo.ei_mode, flock.l_flock.start, flock.l_flock.end);
 
-   rc = md_enqueue(sbi->ll_md_exp, , , NULL, op_data, ,
+   rc = md_enqueue(sbi->ll_md_exp, , , op_data, ,
flags);
 
/* Restore the file lock type if not TEST lock. */
@@ -2527,7 +2527,7 @@ int ll_fsync(struct file *file, loff_t start, loff_t end, 
int datasync)
 
if (rc2 && file_lock->fl_type != F_UNLCK) {
einfo.ei_mode = LCK_NL;
-   md_enqueue(sbi->ll_md_exp, , , NULL, op_data,
+   md_enqueue(sbi->ll_md_exp, , , op_data,
   , flags);
rc = rc2;
}
@@ -3474,12 +3474,7 @@ static int ll_layout_refresh_locked(struct inode *inode)
struct lookup_intent   it;
struct lustre_handle   lockh;

[PATCH 01/22] staging: lustre: llite: initialize xattr->xe_namelen

2018-04-15 Thread James Simmons

When the allocation of xattr->xe_name was moved to kstrdup()
setting xattr->xe_namelen was dropped. This field is used
in several parts of the xattr cache code so it broke xattr
handling. Initialize xattr->xe_namelen when allocating
xattr->xe_name succeeds. Also change the debugging statement
to really report the xattr name instead of its length which
wasn't event being set.

Fixes: b3dd8957c23a ("staging: lustre: lustre: llite: Use kstrdup"
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr_cache.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr_cache.c 
b/drivers/staging/lustre/lustre/llite/xattr_cache.c
index 4dc799d..ef66949 100644
--- a/drivers/staging/lustre/lustre/llite/xattr_cache.c
+++ b/drivers/staging/lustre/lustre/llite/xattr_cache.c
@@ -121,10 +121,12 @@ static int ll_xattr_cache_add(struct list_head *cache,
 
xattr->xe_name = kstrdup(xattr_name, GFP_NOFS);
if (!xattr->xe_name) {
-   CDEBUG(D_CACHE, "failed to alloc xattr name %u\n",
-  xattr->xe_namelen);
+   CDEBUG(D_CACHE, "failed to alloc xattr name %s\n",
+  xattr_name);
goto err_name;
}
+   xattr->xe_namelen = strlen(xattr_name) + 1;
+
xattr->xe_value = kmemdup(xattr_val, xattr_val_len, GFP_NOFS);
if (!xattr->xe_value)
goto err_value;
-- 
1.8.3.1

[PATCH 00/22] staging: lustre: llite: fix xattr handling

2018-04-15 Thread James Simmons

From: James Simmons 

Lustre utilities and user land APIs heavly depend on special xattr
handling. Sadly much of the xattr handling for lustre client has
been broken for awhile. This is all the fixes needed to make xattr
handling work properly with the latest kernels.

Bobi Jam (3):
  staging: lustre: llite: break up ll_setstripe_ea function
  staging: lustre: llite: return from ll_adjust_lum() if lump is NULL
  staging: lustre: llite: eat -EEXIST on setting trusted.lov

Dmitry Eremin (1):
  staging: lustre: llite: add support set_acl method in inode operations

James Simmons (9):
  staging: lustre: llite: initialize xattr->xe_namelen
  staging: lustre: llite: fix invalid size test in ll_setstripe_ea()
  staging: lustre: llite: remove newline in fullname strings
  staging: lustre: llite: record in stats attempted removal of lma/link xattr
  staging: lustre: llite: cleanup posix acl xattr code
  staging: lustre: llite: use proper types in the xattr code
  staging: lustre: llite: cleanup xattr code comments
  staging: lustre: llite: style changes in xattr.c
  staging: lustre: llite: correct removexattr detection

John L. Hammond (3):
  staging: lustre: llite: handle xattr cache refill race
  staging: lustre: llite: use xattr_handler name for ACLs
  staging: lustre: llite: remove unused parameters from md_{get,set}xattr()

Niu Yawei (2):
  staging: lustre: llite: refactor lustre.lov xattr handling
  staging: lustre: llite: add simple comment about lustre.lov xattrs

Robin Humble (1):
  staging: lustre: llite: Remove filtering of seclabel xattr

Vitaly Fertman (3):
  staging: lustre: obd: create it_has_reply_body()
  staging: lustre: obd: change debug reporting in lmv_enqueue()
  staging: lustre: ldlm: xattr locks are lost on mdt

 drivers/staging/lustre/lustre/include/obd.h|  20 +-
 drivers/staging/lustre/lustre/include/obd_class.h  |  24 +--
 drivers/staging/lustre/lustre/llite/file.c |  86 ++--
 .../staging/lustre/lustre/llite/llite_internal.h   |   4 +
 drivers/staging/lustre/lustre/llite/namei.c|  10 +-
 drivers/staging/lustre/lustre/llite/xattr.c| 231 -
 drivers/staging/lustre/lustre/llite/xattr_cache.c  |  83 +++-
 drivers/staging/lustre/lustre/lmv/lmv_intent.c |  12 +-
 drivers/staging/lustre/lustre/lmv/lmv_obd.c|  36 ++--
 drivers/staging/lustre/lustre/mdc/mdc_internal.h   |   4 +-
 drivers/staging/lustre/lustre/mdc/mdc_locks.c  |  68 --
 drivers/staging/lustre/lustre/mdc/mdc_request.c|  34 +--
 12 files changed, 364 insertions(+), 248 deletions(-)

-- 
1.8.3.1

[PATCH 07/22] staging: lustre: llite: refactor lustre.lov xattr handling

2018-04-15 Thread James Simmons

From: Niu Yawei 

The function ll_xattr_set() contains special code to handle
the lustre specific xattr lustre.lov. Move all this code to
a new function ll_setstripe_ea().

Signed-off-by: Bobi Jam 
Signed-off-by: Niu Yawei 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8998
Reviewed-on: https://review.whamcloud.com/24851
Reviewed-by: Andreas Dilger 
Reviewed-by: Lai Siyao 
Reviewed-by: Jinshan Xiong 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 131 +++-
 1 file changed, 69 insertions(+), 62 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index 55a19a5..1b462e4 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -186,6 +186,73 @@ static int get_hsm_state(struct inode *inode, u32 
*hus_states)
return rc;
 }
 
+static int ll_setstripe_ea(struct dentry *dentry, struct lov_user_md *lump,
+  size_t size)
+{
+   struct inode *inode = d_inode(dentry);
+   int rc = 0;
+
+   if (size != 0 && size < sizeof(struct lov_user_md))
+   return -EINVAL;
+
+   /*
+* It is possible to set an xattr to a "" value of zero size.
+* For this case we are going to treat it as a removal.
+*/
+   if (!size && lump)
+   lump = NULL;
+
+   /* Attributes that are saved via getxattr will always have
+* the stripe_offset as 0.  Instead, the MDS should be
+* allowed to pick the starting OST index.   b=17846
+*/
+   if (lump && lump->lmm_stripe_offset == 0)
+   lump->lmm_stripe_offset = -1;
+
+   /* Avoid anyone directly setting the RELEASED flag. */
+   if (lump && (lump->lmm_pattern & LOV_PATTERN_F_RELEASED)) {
+   /* Only if we have a released flag check if the file
+* was indeed archived.
+*/
+   u32 state = HS_NONE;
+
+   rc = get_hsm_state(inode, );
+   if (rc)
+   return rc;
+
+   if (!(state & HS_ARCHIVED)) {
+   CDEBUG(D_VFSTRACE,
+  "hus_states state = %x, pattern = %x\n",
+   state, lump->lmm_pattern);
+   /*
+* Here the state is: real file is not
+* archived but user is requesting to set
+* the RELEASED flag so we mask off the
+* released flag from the request
+*/
+   lump->lmm_pattern ^= LOV_PATTERN_F_RELEASED;
+   }
+   }
+
+   if (lump && S_ISREG(inode->i_mode)) {
+   __u64 it_flags = FMODE_WRITE;
+   int lum_size;
+
+   lum_size = ll_lov_user_md_size(lump);
+   if (lum_size < 0 || size < lum_size)
+   return 0; /* b=10667: ignore error */
+
+   rc = ll_lov_setstripe_ea_info(inode, dentry, it_flags, lump,
+ lum_size);
+   /* b=10667: rc always be 0 here for now */
+   rc = 0;
+   } else if (S_ISDIR(inode->i_mode)) {
+   rc = ll_dir_setstripe(inode, lump, 0);
+   }
+
+   return rc;
+}
+
 static int ll_xattr_set(const struct xattr_handler *handler,
struct dentry *dentry, struct inode *inode,
const char *name, const void *value, size_t size,
@@ -198,73 +265,13 @@ static int ll_xattr_set(const struct xattr_handler 
*handler,
   PFID(ll_inode2fid(inode)), inode, name);
 
if (!strcmp(name, "lov")) {
-   struct lov_user_md *lump = (struct lov_user_md *)value;
int op_type = flags == XATTR_REPLACE ? LPROC_LL_REMOVEXATTR :
   LPROC_LL_SETXATTR;
-   int rc = 0;
 
ll_stats_ops_tally(ll_i2sbi(inode), op_type, 1);
 
-   if (size != 0 && size < sizeof(struct lov_user_md))
-   return -EINVAL;
-
-   /*
-* It is possible to set an xattr to a "" value of zero size.
-* For this case we are going to treat it as a removal.
-*/
-   if (!size && lump)
-   lump = NULL;
-
-   /* Attributes that are saved via getxattr will always have
-* the stripe_offset as 0.  Instead, the MDS should be
-* allowed to pick the starting OST index.   b=17846
-*/
-   if (lump && lump->lmm_stripe_offset == 0)
-   lump->lmm_stripe_offset = -1;
-
-

[PATCH 05/22] staging: lustre: llite: handle xattr cache refill race

2018-04-15 Thread James Simmons

From: "John L. Hammond" 

In ll_xattr_cache_refill() if the xattr cache was invalid (and no
request was sent) then return -EAGAIN so that ll_getxattr_common()
caller will fetch the xattr from the MDT.

Signed-off-by: John L. Hammond 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10132
Reviewed-on: https://review.whamcloud.com/29654
Reviewed-by: Andreas Dilger 
Reviewed-by: Lai Siyao 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr_cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr_cache.c 
b/drivers/staging/lustre/lustre/llite/xattr_cache.c
index 53dfaea..5da69ba0 100644
--- a/drivers/staging/lustre/lustre/llite/xattr_cache.c
+++ b/drivers/staging/lustre/lustre/llite/xattr_cache.c
@@ -357,7 +357,7 @@ static int ll_xattr_cache_refill(struct inode *inode)
if (unlikely(!req)) {
CDEBUG(D_CACHE, "cancelled by a parallel getxattr\n");
ll_intent_drop_lock();
-   rc = -EIO;
+   rc = -EAGAIN;
goto err_unlock;
}
 
-- 
1.8.3.1

[PATCH 07/22] staging: lustre: llite: refactor lustre.lov xattr handling

2018-04-15 Thread James Simmons

From: Niu Yawei 

The function ll_xattr_set() contains special code to handle
the lustre specific xattr lustre.lov. Move all this code to
a new function ll_setstripe_ea().

Signed-off-by: Bobi Jam 
Signed-off-by: Niu Yawei 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8998
Reviewed-on: https://review.whamcloud.com/24851
Reviewed-by: Andreas Dilger 
Reviewed-by: Lai Siyao 
Reviewed-by: Jinshan Xiong 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 131 +++-
 1 file changed, 69 insertions(+), 62 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index 55a19a5..1b462e4 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -186,6 +186,73 @@ static int get_hsm_state(struct inode *inode, u32 
*hus_states)
return rc;
 }
 
+static int ll_setstripe_ea(struct dentry *dentry, struct lov_user_md *lump,
+  size_t size)
+{
+   struct inode *inode = d_inode(dentry);
+   int rc = 0;
+
+   if (size != 0 && size < sizeof(struct lov_user_md))
+   return -EINVAL;
+
+   /*
+* It is possible to set an xattr to a "" value of zero size.
+* For this case we are going to treat it as a removal.
+*/
+   if (!size && lump)
+   lump = NULL;
+
+   /* Attributes that are saved via getxattr will always have
+* the stripe_offset as 0.  Instead, the MDS should be
+* allowed to pick the starting OST index.   b=17846
+*/
+   if (lump && lump->lmm_stripe_offset == 0)
+   lump->lmm_stripe_offset = -1;
+
+   /* Avoid anyone directly setting the RELEASED flag. */
+   if (lump && (lump->lmm_pattern & LOV_PATTERN_F_RELEASED)) {
+   /* Only if we have a released flag check if the file
+* was indeed archived.
+*/
+   u32 state = HS_NONE;
+
+   rc = get_hsm_state(inode, );
+   if (rc)
+   return rc;
+
+   if (!(state & HS_ARCHIVED)) {
+   CDEBUG(D_VFSTRACE,
+  "hus_states state = %x, pattern = %x\n",
+   state, lump->lmm_pattern);
+   /*
+* Here the state is: real file is not
+* archived but user is requesting to set
+* the RELEASED flag so we mask off the
+* released flag from the request
+*/
+   lump->lmm_pattern ^= LOV_PATTERN_F_RELEASED;
+   }
+   }
+
+   if (lump && S_ISREG(inode->i_mode)) {
+   __u64 it_flags = FMODE_WRITE;
+   int lum_size;
+
+   lum_size = ll_lov_user_md_size(lump);
+   if (lum_size < 0 || size < lum_size)
+   return 0; /* b=10667: ignore error */
+
+   rc = ll_lov_setstripe_ea_info(inode, dentry, it_flags, lump,
+ lum_size);
+   /* b=10667: rc always be 0 here for now */
+   rc = 0;
+   } else if (S_ISDIR(inode->i_mode)) {
+   rc = ll_dir_setstripe(inode, lump, 0);
+   }
+
+   return rc;
+}
+
 static int ll_xattr_set(const struct xattr_handler *handler,
struct dentry *dentry, struct inode *inode,
const char *name, const void *value, size_t size,
@@ -198,73 +265,13 @@ static int ll_xattr_set(const struct xattr_handler 
*handler,
   PFID(ll_inode2fid(inode)), inode, name);
 
if (!strcmp(name, "lov")) {
-   struct lov_user_md *lump = (struct lov_user_md *)value;
int op_type = flags == XATTR_REPLACE ? LPROC_LL_REMOVEXATTR :
   LPROC_LL_SETXATTR;
-   int rc = 0;
 
ll_stats_ops_tally(ll_i2sbi(inode), op_type, 1);
 
-   if (size != 0 && size < sizeof(struct lov_user_md))
-   return -EINVAL;
-
-   /*
-* It is possible to set an xattr to a "" value of zero size.
-* For this case we are going to treat it as a removal.
-*/
-   if (!size && lump)
-   lump = NULL;
-
-   /* Attributes that are saved via getxattr will always have
-* the stripe_offset as 0.  Instead, the MDS should be
-* allowed to pick the starting OST index.   b=17846
-*/
-   if (lump && lump->lmm_stripe_offset == 0)
-   lump->lmm_stripe_offset = -1;
-
-   /* Avoid anyone directly setting the RELEASED flag. */
-   if (lump && (lump->lmm_pattern & LOV_PATTERN_F_RELEASED)) {
-   /*

[PATCH 05/22] staging: lustre: llite: handle xattr cache refill race

2018-04-15 Thread James Simmons

From: "John L. Hammond" 

In ll_xattr_cache_refill() if the xattr cache was invalid (and no
request was sent) then return -EAGAIN so that ll_getxattr_common()
caller will fetch the xattr from the MDT.

Signed-off-by: John L. Hammond 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10132
Reviewed-on: https://review.whamcloud.com/29654
Reviewed-by: Andreas Dilger 
Reviewed-by: Lai Siyao 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr_cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr_cache.c 
b/drivers/staging/lustre/lustre/llite/xattr_cache.c
index 53dfaea..5da69ba0 100644
--- a/drivers/staging/lustre/lustre/llite/xattr_cache.c
+++ b/drivers/staging/lustre/lustre/llite/xattr_cache.c
@@ -357,7 +357,7 @@ static int ll_xattr_cache_refill(struct inode *inode)
if (unlikely(!req)) {
CDEBUG(D_CACHE, "cancelled by a parallel getxattr\n");
ll_intent_drop_lock();
-   rc = -EIO;
+   rc = -EAGAIN;
goto err_unlock;
}
 
-- 
1.8.3.1

[PATCH 10/22] staging: lustre: llite: return from ll_adjust_lum() if lump is NULL

2018-04-15 Thread James Simmons

From: Bobi Jam 

No need to check several times if lump is NULL. Just test once and
return 0 if NULL.

Signed-off-by: Bobi Jam 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9484
Reviewed-on: https://review.whamcloud.com/27126
Reviewed-by: Dmitry Eremin 
Reviewed-by: Niu Yawei 
Reviewed-by: James Simmons 
Reviewed-by: Andreas Dilger 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index 78ce85b..56ac07e 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -190,15 +190,18 @@ static int ll_adjust_lum(struct inode *inode, struct 
lov_user_md *lump)
 {
int rc = 0;
 
+   if (!lump)
+   return 0;
+
/* Attributes that are saved via getxattr will always have
 * the stripe_offset as 0.  Instead, the MDS should be
 * allowed to pick the starting OST index.   b=17846
 */
-   if (lump && lump->lmm_stripe_offset == 0)
+   if (lump->lmm_stripe_offset == 0)
lump->lmm_stripe_offset = -1;
 
/* Avoid anyone directly setting the RELEASED flag. */
-   if (lump && (lump->lmm_pattern & LOV_PATTERN_F_RELEASED)) {
+   if (lump->lmm_pattern & LOV_PATTERN_F_RELEASED) {
/* Only if we have a released flag check if the file
 * was indeed archived.
 */
-- 
1.8.3.1

Re: [PATCH v4 4/4] zram: introduce zram memory tracking

2018-04-15 Thread Randy Dunlap

On 04/15/2018 08:31 PM, Minchan Kim wrote:
> zRam as swap is useful for small memory device. However, swap means
> those pages on zram are mostly cold pages due to VM's LRU algorithm.
> Especially, once init data for application are touched for launching,
> they tend to be not accessed any more and finally swapped out.
> zRAM can store such cold pages as compressed form but it's pointless
> to keep in memory. Better idea is app developers free them directly
> rather than remaining them on heap.
> 
> This patch tell us last access time of each block of zram via
> "cat /sys/kernel/debug/zram/zram0/block_state".
> 
> The output is as follows,
>   30075.033841 .wh
>   30163.806904 s..
>   30263.806919 ..h
> 
> First column is zram's block index and 3rh one represents symbol
> (s: same page w: written page to backing store h: huge page) of the
> block state. Second column represents usec time unit of the block
> was last accessed. So above example means the 300th block is accessed
> at 75.033851 second and it was huge so it was written to the backing
> store.
> 
> Admin can leverage this information to catch cold|incompressible pages
> of process with *pagemap* once part of heaps are swapped out.
> 
> Acked-by: Greg Kroah-Hartman 
> Signed-off-by: Minchan Kim 
> ---
>  Documentation/blockdev/zram.txt |  24 ++
>  drivers/block/zram/Kconfig  |  10 +++
>  drivers/block/zram/zram_drv.c   | 140 +---
>  drivers/block/zram/zram_drv.h   |   5 ++
>  4 files changed, 168 insertions(+), 11 deletions(-)
> 
> diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
> index 78db38d02bc9..45509c7d5716 100644
> --- a/Documentation/blockdev/zram.txt
> +++ b/Documentation/blockdev/zram.txt
> @@ -243,5 +243,29 @@ to backing storage rather than keeping it in memory.
>  User should set up backing device via /sys/block/zramX/backing_dev
>  before disksize setting.
>  
> += memory tracking
> +
> +With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the
> +zram block. It could be useful to catch cold or incompressible
> +pages of the proess with*pagemap.

?   process

> +If you enable the feature, you could see block state via
> +/sys/kernel/debug/zram/zram0/block_state". The output is as follows,
> +
> +   30075.033841 .wh
> +   30163.806904 s..
> +   30263.806919 ..h
> +
> +First column is zram's block index.
> +Second column is access time.
> +Third column is state of the block.
> +(s: same page
> +w: written page to backing store
> +h: huge page)
> +
> +First line of above example says 300th block is accessed at 75.033841sec
> +and the block's state is huge so it is written back to the backing
> +storage. It's a debugging feature so anyone shouldn't rely on it to work
> +properly.
> +
>  Nitin Gupta
>  ngu...@vflare.org
> diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
> index ac3a31d433b2..01090338fb47 100644
> --- a/drivers/block/zram/Kconfig
> +++ b/drivers/block/zram/Kconfig
> @@ -26,3 +26,13 @@ config ZRAM_WRITEBACK
>/sys/block/zramX/backing_dev.
>  
>See zram.txt for more infomration.
> +
> +config ZRAM_MEMORY_TRACKING
> + bool "Tracking zram block status"

bool "Track zram block status"

although sometimes it is zRam or zRAM.


> + depends on ZRAM && DEBUG_FS
> + help
> +   With this feature, admin can track the state of allocated block

blocks

> +   of zRAM. Admin could see the information via
> +   /sys/kernel/debug/zram/zramX/block_state.
> +
> +   See zram.txt for more information.

  See Documentation/blockdev/zram.txt for more information.


-- 
~Randy

[PATCH 03/22] staging: lustre: obd: change debug reporting in lmv_enqueue()

2018-04-15 Thread James Simmons

From: Vitaly Fertman 

Remove LL_IT2STR(it) from debug macros in lmv_enqueue(). The
removal makes it possible to simplify the md_enqueue() functions.

Signed-off-by: Vitaly Fertman 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7433
Seagate-bug-id: MRP-3072 MRP-3137
Reviewed-on: http://review.whamcloud.com/17220
Reviewed-by: Andrew Perepechko 
Reviewed-by: Andriy Skulysh 
Tested-by: Elena V. Gryaznova 
Reviewed-by: John L. Hammond 
Reviewed-by: Lai Siyao 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/lmv/lmv_obd.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c 
b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 7be9310..e1c93cd 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -1660,15 +1660,14 @@ static int lmv_create(struct obd_export *exp, struct 
md_op_data *op_data,
struct lmv_obd *lmv = >u.lmv;
struct lmv_tgt_desc  *tgt;
 
-   CDEBUG(D_INODE, "ENQUEUE '%s' on " DFID "\n",
-  LL_IT2STR(it), PFID(_data->op_fid1));
+   CDEBUG(D_INODE, "ENQUEUE on " DFID "\n", PFID(_data->op_fid1));
 
tgt = lmv_locate_mds(lmv, op_data, _data->op_fid1);
if (IS_ERR(tgt))
return PTR_ERR(tgt);
 
-   CDEBUG(D_INODE, "ENQUEUE '%s' on " DFID " -> mds #%u\n",
-  LL_IT2STR(it), PFID(_data->op_fid1), tgt->ltd_idx);
+   CDEBUG(D_INODE, "ENQUEUE on " DFID " -> mds #%u\n",
+  PFID(_data->op_fid1), tgt->ltd_idx);
 
return md_enqueue(tgt->ltd_exp, einfo, policy, it, op_data, lockh,
extra_lock_flags);
-- 
1.8.3.1

[PATCH 10/22] staging: lustre: llite: return from ll_adjust_lum() if lump is NULL

2018-04-15 Thread James Simmons

From: Bobi Jam 

No need to check several times if lump is NULL. Just test once and
return 0 if NULL.

Signed-off-by: Bobi Jam 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9484
Reviewed-on: https://review.whamcloud.com/27126
Reviewed-by: Dmitry Eremin 
Reviewed-by: Niu Yawei 
Reviewed-by: James Simmons 
Reviewed-by: Andreas Dilger 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index 78ce85b..56ac07e 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -190,15 +190,18 @@ static int ll_adjust_lum(struct inode *inode, struct 
lov_user_md *lump)
 {
int rc = 0;
 
+   if (!lump)
+   return 0;
+
/* Attributes that are saved via getxattr will always have
 * the stripe_offset as 0.  Instead, the MDS should be
 * allowed to pick the starting OST index.   b=17846
 */
-   if (lump && lump->lmm_stripe_offset == 0)
+   if (lump->lmm_stripe_offset == 0)
lump->lmm_stripe_offset = -1;
 
/* Avoid anyone directly setting the RELEASED flag. */
-   if (lump && (lump->lmm_pattern & LOV_PATTERN_F_RELEASED)) {
+   if (lump->lmm_pattern & LOV_PATTERN_F_RELEASED) {
/* Only if we have a released flag check if the file
 * was indeed archived.
 */
-- 
1.8.3.1

Re: [PATCH v4 4/4] zram: introduce zram memory tracking

2018-04-15 Thread Randy Dunlap

On 04/15/2018 08:31 PM, Minchan Kim wrote:
> zRam as swap is useful for small memory device. However, swap means
> those pages on zram are mostly cold pages due to VM's LRU algorithm.
> Especially, once init data for application are touched for launching,
> they tend to be not accessed any more and finally swapped out.
> zRAM can store such cold pages as compressed form but it's pointless
> to keep in memory. Better idea is app developers free them directly
> rather than remaining them on heap.
> 
> This patch tell us last access time of each block of zram via
> "cat /sys/kernel/debug/zram/zram0/block_state".
> 
> The output is as follows,
>   30075.033841 .wh
>   30163.806904 s..
>   30263.806919 ..h
> 
> First column is zram's block index and 3rh one represents symbol
> (s: same page w: written page to backing store h: huge page) of the
> block state. Second column represents usec time unit of the block
> was last accessed. So above example means the 300th block is accessed
> at 75.033851 second and it was huge so it was written to the backing
> store.
> 
> Admin can leverage this information to catch cold|incompressible pages
> of process with *pagemap* once part of heaps are swapped out.
> 
> Acked-by: Greg Kroah-Hartman 
> Signed-off-by: Minchan Kim 
> ---
>  Documentation/blockdev/zram.txt |  24 ++
>  drivers/block/zram/Kconfig  |  10 +++
>  drivers/block/zram/zram_drv.c   | 140 +---
>  drivers/block/zram/zram_drv.h   |   5 ++
>  4 files changed, 168 insertions(+), 11 deletions(-)
> 
> diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
> index 78db38d02bc9..45509c7d5716 100644
> --- a/Documentation/blockdev/zram.txt
> +++ b/Documentation/blockdev/zram.txt
> @@ -243,5 +243,29 @@ to backing storage rather than keeping it in memory.
>  User should set up backing device via /sys/block/zramX/backing_dev
>  before disksize setting.
>  
> += memory tracking
> +
> +With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the
> +zram block. It could be useful to catch cold or incompressible
> +pages of the proess with*pagemap.

?   process

> +If you enable the feature, you could see block state via
> +/sys/kernel/debug/zram/zram0/block_state". The output is as follows,
> +
> +   30075.033841 .wh
> +   30163.806904 s..
> +   30263.806919 ..h
> +
> +First column is zram's block index.
> +Second column is access time.
> +Third column is state of the block.
> +(s: same page
> +w: written page to backing store
> +h: huge page)
> +
> +First line of above example says 300th block is accessed at 75.033841sec
> +and the block's state is huge so it is written back to the backing
> +storage. It's a debugging feature so anyone shouldn't rely on it to work
> +properly.
> +
>  Nitin Gupta
>  ngu...@vflare.org
> diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
> index ac3a31d433b2..01090338fb47 100644
> --- a/drivers/block/zram/Kconfig
> +++ b/drivers/block/zram/Kconfig
> @@ -26,3 +26,13 @@ config ZRAM_WRITEBACK
>/sys/block/zramX/backing_dev.
>  
>See zram.txt for more infomration.
> +
> +config ZRAM_MEMORY_TRACKING
> + bool "Tracking zram block status"

bool "Track zram block status"

although sometimes it is zRam or zRAM.


> + depends on ZRAM && DEBUG_FS
> + help
> +   With this feature, admin can track the state of allocated block

blocks

> +   of zRAM. Admin could see the information via
> +   /sys/kernel/debug/zram/zramX/block_state.
> +
> +   See zram.txt for more information.

  See Documentation/blockdev/zram.txt for more information.


-- 
~Randy

[PATCH 03/22] staging: lustre: obd: change debug reporting in lmv_enqueue()

2018-04-15 Thread James Simmons

From: Vitaly Fertman 

Remove LL_IT2STR(it) from debug macros in lmv_enqueue(). The
removal makes it possible to simplify the md_enqueue() functions.

Signed-off-by: Vitaly Fertman 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7433
Seagate-bug-id: MRP-3072 MRP-3137
Reviewed-on: http://review.whamcloud.com/17220
Reviewed-by: Andrew Perepechko 
Reviewed-by: Andriy Skulysh 
Tested-by: Elena V. Gryaznova 
Reviewed-by: John L. Hammond 
Reviewed-by: Lai Siyao 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/lmv/lmv_obd.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c 
b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 7be9310..e1c93cd 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -1660,15 +1660,14 @@ static int lmv_create(struct obd_export *exp, struct 
md_op_data *op_data,
struct lmv_obd *lmv = >u.lmv;
struct lmv_tgt_desc  *tgt;
 
-   CDEBUG(D_INODE, "ENQUEUE '%s' on " DFID "\n",
-  LL_IT2STR(it), PFID(_data->op_fid1));
+   CDEBUG(D_INODE, "ENQUEUE on " DFID "\n", PFID(_data->op_fid1));
 
tgt = lmv_locate_mds(lmv, op_data, _data->op_fid1);
if (IS_ERR(tgt))
return PTR_ERR(tgt);
 
-   CDEBUG(D_INODE, "ENQUEUE '%s' on " DFID " -> mds #%u\n",
-  LL_IT2STR(it), PFID(_data->op_fid1), tgt->ltd_idx);
+   CDEBUG(D_INODE, "ENQUEUE on " DFID " -> mds #%u\n",
+  PFID(_data->op_fid1), tgt->ltd_idx);
 
return md_enqueue(tgt->ltd_exp, einfo, policy, it, op_data, lockh,
extra_lock_flags);
-- 
1.8.3.1

[PATCH 06/22] staging: lustre: llite: Remove filtering of seclabel xattr

2018-04-15 Thread James Simmons

From: Robin Humble 

The security.capability xattr is used to implement File
Capabilities in recent Linux versions. Capabilities are a
fine grained approach to granting executables elevated
privileges. eg. /bin/ping can have capabilities
cap_net_admin, cap_net_raw+ep instead of being setuid root.

This xattr has long been filtered out by llite, initially for
stability reasons (b15587), and later over performance
concerns as this xattr is read for every file with eg.
'ls --color'. Since LU-2869 xattr's are cached on clients,
alleviating most performance concerns.

Removing llite's filtering of the security.capability xattr
enables using Lustre as a root filesystem, which is used on
some large clusters.

Signed-off-by: Robin Humble 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9562
Reviewed-on: https://review.whamcloud.com/27292
Reviewed-by: John L. Hammond 
Reviewed-by: Sebastien Buisson 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index 2d78432..55a19a5 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -117,11 +117,6 @@ static int xattr_type_filter(struct ll_sb_info *sbi,
 (handler->flags == XATTR_LUSTRE_T && !strcmp(name, "lov"
return 0;
 
-   /* b15587: ignore security.capability xattr for now */
-   if ((handler->flags == XATTR_SECURITY_T &&
-!strcmp(name, "capability")))
-   return 0;
-
/* LU-549:  Disable security.selinux when selinux is disabled */
if (handler->flags == XATTR_SECURITY_T && !selinux_is_enabled() &&
strcmp(name, "selinux") == 0)
@@ -383,10 +378,6 @@ static int ll_xattr_get_common(const struct xattr_handler 
*handler,
if (rc)
return rc;
 
-   /* b15587: ignore security.capability xattr for now */
-   if ((handler->flags == XATTR_SECURITY_T && !strcmp(name, "capability")))
-   return -ENODATA;
-
/* LU-549:  Disable security.selinux when selinux is disabled */
if (handler->flags == XATTR_SECURITY_T && !selinux_is_enabled() &&
!strcmp(name, "selinux"))
-- 
1.8.3.1

[PATCH 09/22] staging: lustre: llite: break up ll_setstripe_ea function

2018-04-15 Thread James Simmons

From: Bobi Jam 

Place all the handling of information of trusted.lov that
is not stripe related into the new function ll_adjust_lum().
Now ll_setstripe_ea() only handles striping information.

Signed-off-by: Bobi Jam 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9484
Reviewed-on: https://review.whamcloud.com/27126
Reviewed-by: Dmitry Eremin 
Reviewed-by: Niu Yawei 
Reviewed-by: James Simmons 
Reviewed-by: Andreas Dilger 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 37 +++--
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index c1600b9..78ce85b 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -186,22 +186,10 @@ static int get_hsm_state(struct inode *inode, u32 
*hus_states)
return rc;
 }
 
-static int ll_setstripe_ea(struct dentry *dentry, struct lov_user_md *lump,
-  size_t size)
+static int ll_adjust_lum(struct inode *inode, struct lov_user_md *lump)
 {
-   struct inode *inode = d_inode(dentry);
int rc = 0;
 
-   if (size != 0 && size < sizeof(struct lov_user_md))
-   return -EINVAL;
-
-   /*
-* It is possible to set an xattr to a "" value of zero size.
-* For this case we are going to treat it as a removal.
-*/
-   if (!size && lump)
-   lump = NULL;
-
/* Attributes that are saved via getxattr will always have
 * the stripe_offset as 0.  Instead, the MDS should be
 * allowed to pick the starting OST index.   b=17846
@@ -234,6 +222,29 @@ static int ll_setstripe_ea(struct dentry *dentry, struct 
lov_user_md *lump,
}
}
 
+   return rc;
+}
+
+static int ll_setstripe_ea(struct dentry *dentry, struct lov_user_md *lump,
+  size_t size)
+{
+   struct inode *inode = d_inode(dentry);
+   int rc = 0;
+
+   if (size != 0 && size < sizeof(struct lov_user_md))
+   return -EINVAL;
+
+   /*
+* It is possible to set an xattr to a "" value of zero size.
+* For this case we are going to treat it as a removal.
+*/
+   if (!size && lump)
+   lump = NULL;
+
+   rc = ll_adjust_lum(inode, lump);
+   if (rc)
+   return rc;
+
if (lump && S_ISREG(inode->i_mode)) {
__u64 it_flags = FMODE_WRITE;
int lum_size;
-- 
1.8.3.1

[PATCH 06/22] staging: lustre: llite: Remove filtering of seclabel xattr

2018-04-15 Thread James Simmons

From: Robin Humble 

The security.capability xattr is used to implement File
Capabilities in recent Linux versions. Capabilities are a
fine grained approach to granting executables elevated
privileges. eg. /bin/ping can have capabilities
cap_net_admin, cap_net_raw+ep instead of being setuid root.

This xattr has long been filtered out by llite, initially for
stability reasons (b15587), and later over performance
concerns as this xattr is read for every file with eg.
'ls --color'. Since LU-2869 xattr's are cached on clients,
alleviating most performance concerns.

Removing llite's filtering of the security.capability xattr
enables using Lustre as a root filesystem, which is used on
some large clusters.

Signed-off-by: Robin Humble 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9562
Reviewed-on: https://review.whamcloud.com/27292
Reviewed-by: John L. Hammond 
Reviewed-by: Sebastien Buisson 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index 2d78432..55a19a5 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -117,11 +117,6 @@ static int xattr_type_filter(struct ll_sb_info *sbi,
 (handler->flags == XATTR_LUSTRE_T && !strcmp(name, "lov"
return 0;
 
-   /* b15587: ignore security.capability xattr for now */
-   if ((handler->flags == XATTR_SECURITY_T &&
-!strcmp(name, "capability")))
-   return 0;
-
/* LU-549:  Disable security.selinux when selinux is disabled */
if (handler->flags == XATTR_SECURITY_T && !selinux_is_enabled() &&
strcmp(name, "selinux") == 0)
@@ -383,10 +378,6 @@ static int ll_xattr_get_common(const struct xattr_handler 
*handler,
if (rc)
return rc;
 
-   /* b15587: ignore security.capability xattr for now */
-   if ((handler->flags == XATTR_SECURITY_T && !strcmp(name, "capability")))
-   return -ENODATA;
-
/* LU-549:  Disable security.selinux when selinux is disabled */
if (handler->flags == XATTR_SECURITY_T && !selinux_is_enabled() &&
!strcmp(name, "selinux"))
-- 
1.8.3.1

[PATCH 09/22] staging: lustre: llite: break up ll_setstripe_ea function

2018-04-15 Thread James Simmons

From: Bobi Jam 

Place all the handling of information of trusted.lov that
is not stripe related into the new function ll_adjust_lum().
Now ll_setstripe_ea() only handles striping information.

Signed-off-by: Bobi Jam 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9484
Reviewed-on: https://review.whamcloud.com/27126
Reviewed-by: Dmitry Eremin 
Reviewed-by: Niu Yawei 
Reviewed-by: James Simmons 
Reviewed-by: Andreas Dilger 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 37 +++--
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index c1600b9..78ce85b 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -186,22 +186,10 @@ static int get_hsm_state(struct inode *inode, u32 
*hus_states)
return rc;
 }
 
-static int ll_setstripe_ea(struct dentry *dentry, struct lov_user_md *lump,
-  size_t size)
+static int ll_adjust_lum(struct inode *inode, struct lov_user_md *lump)
 {
-   struct inode *inode = d_inode(dentry);
int rc = 0;
 
-   if (size != 0 && size < sizeof(struct lov_user_md))
-   return -EINVAL;
-
-   /*
-* It is possible to set an xattr to a "" value of zero size.
-* For this case we are going to treat it as a removal.
-*/
-   if (!size && lump)
-   lump = NULL;
-
/* Attributes that are saved via getxattr will always have
 * the stripe_offset as 0.  Instead, the MDS should be
 * allowed to pick the starting OST index.   b=17846
@@ -234,6 +222,29 @@ static int ll_setstripe_ea(struct dentry *dentry, struct 
lov_user_md *lump,
}
}
 
+   return rc;
+}
+
+static int ll_setstripe_ea(struct dentry *dentry, struct lov_user_md *lump,
+  size_t size)
+{
+   struct inode *inode = d_inode(dentry);
+   int rc = 0;
+
+   if (size != 0 && size < sizeof(struct lov_user_md))
+   return -EINVAL;
+
+   /*
+* It is possible to set an xattr to a "" value of zero size.
+* For this case we are going to treat it as a removal.
+*/
+   if (!size && lump)
+   lump = NULL;
+
+   rc = ll_adjust_lum(inode, lump);
+   if (rc)
+   return rc;
+
if (lump && S_ISREG(inode->i_mode)) {
__u64 it_flags = FMODE_WRITE;
int lum_size;
-- 
1.8.3.1

[PATCH 14/22] staging: lustre: llite: record in stats attempted removal of lma/link xattr

2018-04-15 Thread James Simmons

Keep track of attempted deletions as well as changing of the
lma/link xattrs.

Signed-off-by: James Simmons 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183
Reviewed-on: https://review.whamcloud.com/27240
Reviewed-by: Dmitry Eremin 
Reviewed-by: Bob Glossman 
Reviewed-by: Sebastien Buisson 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index 4b1e565..3ab7ae0 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -296,7 +296,10 @@ static int ll_xattr_set(const struct xattr_handler 
*handler,
return ll_setstripe_ea(dentry, (struct lov_user_md *)value,
   size);
} else if (!strcmp(name, "lma") || !strcmp(name, "link")) {
-   ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_SETXATTR, 1);
+   int op_type = flags == XATTR_REPLACE ? LPROC_LL_REMOVEXATTR :
+  LPROC_LL_SETXATTR;
+
+   ll_stats_ops_tally(ll_i2sbi(inode), op_type, 1);
return 0;
}
 
-- 
1.8.3.1

[PATCH 14/22] staging: lustre: llite: record in stats attempted removal of lma/link xattr

2018-04-15 Thread James Simmons

Keep track of attempted deletions as well as changing of the
lma/link xattrs.

Signed-off-by: James Simmons 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183
Reviewed-on: https://review.whamcloud.com/27240
Reviewed-by: Dmitry Eremin 
Reviewed-by: Bob Glossman 
Reviewed-by: Sebastien Buisson 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index 4b1e565..3ab7ae0 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -296,7 +296,10 @@ static int ll_xattr_set(const struct xattr_handler 
*handler,
return ll_setstripe_ea(dentry, (struct lov_user_md *)value,
   size);
} else if (!strcmp(name, "lma") || !strcmp(name, "link")) {
-   ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_SETXATTR, 1);
+   int op_type = flags == XATTR_REPLACE ? LPROC_LL_REMOVEXATTR :
+  LPROC_LL_SETXATTR;
+
+   ll_stats_ops_tally(ll_i2sbi(inode), op_type, 1);
return 0;
}
 
-- 
1.8.3.1

[PATCH 12/22] staging: lustre: llite: fix invalid size test in ll_setstripe_ea()

2018-04-15 Thread James Simmons

The size check at the start of ll_setstripe_ea() is only
valid for a directory. Move that check to the section of
code handling the S_ISDIR case.

Signed-off-by: James Simmons 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183
Reviewed-on: https://review.whamcloud.com/27240
Reviewed-by: Dmitry Eremin 
Reviewed-by: Bob Glossman 
Reviewed-by: Sebastien Buisson 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index 69c5227..42a6fb4 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -234,9 +234,6 @@ static int ll_setstripe_ea(struct dentry *dentry, struct 
lov_user_md *lump,
struct inode *inode = d_inode(dentry);
int rc = 0;
 
-   if (size != 0 && size < sizeof(struct lov_user_md))
-   return -EINVAL;
-
/*
 * It is possible to set an xattr to a "" value of zero size.
 * For this case we are going to treat it as a removal.
@@ -269,6 +266,9 @@ static int ll_setstripe_ea(struct dentry *dentry, struct 
lov_user_md *lump,
if (rc == -EEXIST)
rc = 0;
} else if (S_ISDIR(inode->i_mode)) {
+   if (size != 0 && size < sizeof(struct lov_user_md))
+   return -EINVAL;
+
rc = ll_dir_setstripe(inode, lump, 0);
}
 
-- 
1.8.3.1

[PATCH 08/22] staging: lustre: llite: add simple comment about lustre.lov xattrs

2018-04-15 Thread James Simmons

From: Niu Yawei 

Simple comment added to ll_xattr_set.

Signed-off-by: Bobi Jam 
Signed-off-by: Niu Yawei 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8998
Reviewed-on: https://review.whamcloud.com/24851
Reviewed-by: Andreas Dilger 
Reviewed-by: Lai Siyao 
Reviewed-by: Jinshan Xiong 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index 1b462e4..c1600b9 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -264,6 +264,7 @@ static int ll_xattr_set(const struct xattr_handler *handler,
CDEBUG(D_VFSTRACE, "VFS Op:inode=" DFID "(%p), xattr %s\n",
   PFID(ll_inode2fid(inode)), inode, name);
 
+   /* lustre/trusted.lov.xxx would be passed through xattr API */
if (!strcmp(name, "lov")) {
int op_type = flags == XATTR_REPLACE ? LPROC_LL_REMOVEXATTR :
   LPROC_LL_SETXATTR;
-- 
1.8.3.1

[PATCH 11/22] staging: lustre: llite: eat -EEXIST on setting trusted.lov

2018-04-15 Thread James Simmons

From: Bobi Jam 

Tools like rsync, tar, cp may copy and restore the xattrs on a file.
The client previously ignored the setting of trusted.lov/lustre.lov
if the layout had already been specified, to avoid causing these
tools to fail for no reason.

For PFL files we still need to silently eat -EEXIST on setting these
attributes to avoid problems.

Signed-off-by: Bobi Jam 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9484
Reviewed-on: https://review.whamcloud.com/27126
Reviewed-by: Dmitry Eremin 
Reviewed-by: Niu Yawei 
Reviewed-by: James Simmons 
Reviewed-by: Andreas Dilger 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index 56ac07e..69c5227 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -254,12 +254,20 @@ static int ll_setstripe_ea(struct dentry *dentry, struct 
lov_user_md *lump,
 
lum_size = ll_lov_user_md_size(lump);
if (lum_size < 0 || size < lum_size)
-   return 0; /* b=10667: ignore error */
+   return -ERANGE;
 
rc = ll_lov_setstripe_ea_info(inode, dentry, it_flags, lump,
  lum_size);
-   /* b=10667: rc always be 0 here for now */
-   rc = 0;
+   /**
+* b=10667: ignore -EEXIST.
+* Silently eat error on setting trusted.lov/lustre.lov
+* attribute for platforms that added the default option
+* to copy all attributes in 'cp' command. Both rsync and
+* tar --xattrs also will try to set LOVEA for existing
+* files.
+*/
+   if (rc == -EEXIST)
+   rc = 0;
} else if (S_ISDIR(inode->i_mode)) {
rc = ll_dir_setstripe(inode, lump, 0);
}
-- 
1.8.3.1

[PATCH 12/22] staging: lustre: llite: fix invalid size test in ll_setstripe_ea()

2018-04-15 Thread James Simmons

The size check at the start of ll_setstripe_ea() is only
valid for a directory. Move that check to the section of
code handling the S_ISDIR case.

Signed-off-by: James Simmons 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183
Reviewed-on: https://review.whamcloud.com/27240
Reviewed-by: Dmitry Eremin 
Reviewed-by: Bob Glossman 
Reviewed-by: Sebastien Buisson 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index 69c5227..42a6fb4 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -234,9 +234,6 @@ static int ll_setstripe_ea(struct dentry *dentry, struct 
lov_user_md *lump,
struct inode *inode = d_inode(dentry);
int rc = 0;
 
-   if (size != 0 && size < sizeof(struct lov_user_md))
-   return -EINVAL;
-
/*
 * It is possible to set an xattr to a "" value of zero size.
 * For this case we are going to treat it as a removal.
@@ -269,6 +266,9 @@ static int ll_setstripe_ea(struct dentry *dentry, struct 
lov_user_md *lump,
if (rc == -EEXIST)
rc = 0;
} else if (S_ISDIR(inode->i_mode)) {
+   if (size != 0 && size < sizeof(struct lov_user_md))
+   return -EINVAL;
+
rc = ll_dir_setstripe(inode, lump, 0);
}
 
-- 
1.8.3.1

[PATCH 08/22] staging: lustre: llite: add simple comment about lustre.lov xattrs

2018-04-15 Thread James Simmons

From: Niu Yawei 

Simple comment added to ll_xattr_set.

Signed-off-by: Bobi Jam 
Signed-off-by: Niu Yawei 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8998
Reviewed-on: https://review.whamcloud.com/24851
Reviewed-by: Andreas Dilger 
Reviewed-by: Lai Siyao 
Reviewed-by: Jinshan Xiong 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index 1b462e4..c1600b9 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -264,6 +264,7 @@ static int ll_xattr_set(const struct xattr_handler *handler,
CDEBUG(D_VFSTRACE, "VFS Op:inode=" DFID "(%p), xattr %s\n",
   PFID(ll_inode2fid(inode)), inode, name);
 
+   /* lustre/trusted.lov.xxx would be passed through xattr API */
if (!strcmp(name, "lov")) {
int op_type = flags == XATTR_REPLACE ? LPROC_LL_REMOVEXATTR :
   LPROC_LL_SETXATTR;
-- 
1.8.3.1

[PATCH 11/22] staging: lustre: llite: eat -EEXIST on setting trusted.lov

2018-04-15 Thread James Simmons

From: Bobi Jam 

Tools like rsync, tar, cp may copy and restore the xattrs on a file.
The client previously ignored the setting of trusted.lov/lustre.lov
if the layout had already been specified, to avoid causing these
tools to fail for no reason.

For PFL files we still need to silently eat -EEXIST on setting these
attributes to avoid problems.

Signed-off-by: Bobi Jam 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9484
Reviewed-on: https://review.whamcloud.com/27126
Reviewed-by: Dmitry Eremin 
Reviewed-by: Niu Yawei 
Reviewed-by: James Simmons 
Reviewed-by: Andreas Dilger 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index 56ac07e..69c5227 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -254,12 +254,20 @@ static int ll_setstripe_ea(struct dentry *dentry, struct 
lov_user_md *lump,
 
lum_size = ll_lov_user_md_size(lump);
if (lum_size < 0 || size < lum_size)
-   return 0; /* b=10667: ignore error */
+   return -ERANGE;
 
rc = ll_lov_setstripe_ea_info(inode, dentry, it_flags, lump,
  lum_size);
-   /* b=10667: rc always be 0 here for now */
-   rc = 0;
+   /**
+* b=10667: ignore -EEXIST.
+* Silently eat error on setting trusted.lov/lustre.lov
+* attribute for platforms that added the default option
+* to copy all attributes in 'cp' command. Both rsync and
+* tar --xattrs also will try to set LOVEA for existing
+* files.
+*/
+   if (rc == -EEXIST)
+   rc = 0;
} else if (S_ISDIR(inode->i_mode)) {
rc = ll_dir_setstripe(inode, lump, 0);
}
-- 
1.8.3.1

[PATCH 13/22] staging: lustre: llite: remove newline in fullname strings

2018-04-15 Thread James Simmons

In creating the full name of a xattr a new line was added that
was seen by the remote MDS server which confused it. Remove the
newline.

Signed-off-by: James Simmons 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183
Reviewed-on: https://review.whamcloud.com/27240
Reviewed-by: Dmitry Eremin 
Reviewed-by: Bob Glossman 
Reviewed-by: Sebastien Buisson 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index 42a6fb4..4b1e565 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -136,7 +136,7 @@ static int xattr_type_filter(struct ll_sb_info *sbi,
return -EPERM;
}
 
-   fullname = kasprintf(GFP_KERNEL, "%s%s\n", handler->prefix, name);
+   fullname = kasprintf(GFP_KERNEL, "%s%s", handler->prefix, name);
if (!fullname)
return -ENOMEM;
rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode),
@@ -435,7 +435,7 @@ static int ll_xattr_get_common(const struct xattr_handler 
*handler,
if (handler->flags == XATTR_ACL_DEFAULT_T && !S_ISDIR(inode->i_mode))
return -ENODATA;
 #endif
-   fullname = kasprintf(GFP_KERNEL, "%s%s\n", handler->prefix, name);
+   fullname = kasprintf(GFP_KERNEL, "%s%s", handler->prefix, name);
if (!fullname)
return -ENOMEM;
rc = ll_xattr_list(inode, fullname, handler->flags, buffer, size,
-- 
1.8.3.1

[PATCH 13/22] staging: lustre: llite: remove newline in fullname strings

2018-04-15 Thread James Simmons

In creating the full name of a xattr a new line was added that
was seen by the remote MDS server which confused it. Remove the
newline.

Signed-off-by: James Simmons 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183
Reviewed-on: https://review.whamcloud.com/27240
Reviewed-by: Dmitry Eremin 
Reviewed-by: Bob Glossman 
Reviewed-by: Sebastien Buisson 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index 42a6fb4..4b1e565 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -136,7 +136,7 @@ static int xattr_type_filter(struct ll_sb_info *sbi,
return -EPERM;
}
 
-   fullname = kasprintf(GFP_KERNEL, "%s%s\n", handler->prefix, name);
+   fullname = kasprintf(GFP_KERNEL, "%s%s", handler->prefix, name);
if (!fullname)
return -ENOMEM;
rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode),
@@ -435,7 +435,7 @@ static int ll_xattr_get_common(const struct xattr_handler 
*handler,
if (handler->flags == XATTR_ACL_DEFAULT_T && !S_ISDIR(inode->i_mode))
return -ENODATA;
 #endif
-   fullname = kasprintf(GFP_KERNEL, "%s%s\n", handler->prefix, name);
+   fullname = kasprintf(GFP_KERNEL, "%s%s", handler->prefix, name);
if (!fullname)
return -ENOMEM;
rc = ll_xattr_list(inode, fullname, handler->flags, buffer, size,
-- 
1.8.3.1

[PATCH 19/22] staging: lustre: llite: add support set_acl method in inode operations

2018-04-15 Thread James Simmons

From: Dmitry Eremin 

Linux kernel v3.14 adds set_acl method to inode operations.
This patch adds support to Lustre for proper acl management.

Signed-off-by: Dmitry Eremin 
Signed-off-by: John L. Hammond 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183
Reviewed-on: https://review.whamcloud.com/25965
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10541
Reviewed-on: https://review.whamcloud.com/
Reviewed-by: Bob Glossman 
Reviewed-by: James Simmons 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/file.c | 67 ++
 .../staging/lustre/lustre/llite/llite_internal.h   |  4 ++
 drivers/staging/lustre/lustre/llite/namei.c| 10 +++-
 3 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/file.c 
b/drivers/staging/lustre/lustre/llite/file.c
index 0026fde..35f5bda 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -3030,6 +3030,7 @@ static int ll_fiemap(struct inode *inode, struct 
fiemap_extent_info *fieinfo,
return rc;
 }
 
+#ifdef CONFIG_FS_POSIX_ACL
 struct posix_acl *ll_get_acl(struct inode *inode, int type)
 {
struct ll_inode_info *lli = ll_i2info(inode);
@@ -3043,6 +3044,69 @@ struct posix_acl *ll_get_acl(struct inode *inode, int 
type)
return acl;
 }
 
+int ll_set_acl(struct inode *inode, struct posix_acl *acl, int type)
+{
+   struct ll_sb_info *sbi = ll_i2sbi(inode);
+   struct ptlrpc_request *req = NULL;
+   const char *name = NULL;
+   size_t value_size = 0;
+   char *value = NULL;
+   int rc;
+
+   switch (type) {
+   case ACL_TYPE_ACCESS:
+   name = XATTR_NAME_POSIX_ACL_ACCESS;
+   if (acl) {
+   rc = posix_acl_update_mode(inode, >i_mode, );
+   if (rc)
+   goto out;
+   }
+
+   break;
+
+   case ACL_TYPE_DEFAULT:
+   name = XATTR_NAME_POSIX_ACL_DEFAULT;
+   if (!S_ISDIR(inode->i_mode)) {
+   rc = acl ? -EACCES : 0;
+   goto out;
+   }
+
+   break;
+
+   default:
+   rc = -EINVAL;
+   goto out;
+   }
+
+   if (acl) {
+   value_size = posix_acl_xattr_size(acl->a_count);
+   value = kmalloc(value_size, GFP_NOFS);
+   if (!value) {
+   rc = -ENOMEM;
+   goto out;
+   }
+
+   rc = posix_acl_to_xattr(_user_ns, acl, value, value_size);
+   if (rc < 0)
+   goto out_value;
+   }
+
+   rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode),
+value ? OBD_MD_FLXATTR : OBD_MD_FLXATTRRM,
+name, value, value_size, 0, 0, 0, );
+
+   ptlrpc_req_finished(req);
+out_value:
+   kfree(value);
+out:
+   if (!rc)
+   set_cached_acl(inode, type, acl);
+   else
+   forget_cached_acl(inode, type);
+   return rc;
+}
+#endif /* CONFIG_FS_POSIX_ACL */
+
 int ll_inode_permission(struct inode *inode, int mask)
 {
struct ll_sb_info *sbi;
@@ -3164,7 +3228,10 @@ int ll_inode_permission(struct inode *inode, int mask)
.permission = ll_inode_permission,
.listxattr  = ll_listxattr,
.fiemap = ll_fiemap,
+#ifdef CONFIG_FS_POSIX_ACL
.get_acl= ll_get_acl,
+   .set_acl= ll_set_acl,
+#endif
 };
 
 /* dynamic ioctl number support routines */
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h 
b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 6504850..2280327 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -754,7 +754,11 @@ enum ldlm_mode ll_take_md_lock(struct inode *inode, __u64 
bits,
 int ll_md_real_close(struct inode *inode, fmode_t fmode);
 int ll_getattr(const struct path *path, struct kstat *stat,
   u32 request_mask, unsigned int flags);
+#ifdef CONFIG_FS_POSIX_ACL
 struct posix_acl *ll_get_acl(struct inode *inode, int type);
+int ll_set_acl(struct inode *inode, struct posix_acl *acl, int type);
+#endif /* CONFIG_FS_POSIX_ACL */
+
 int ll_migrate(struct inode *parent, struct file *file, int mdtidx,
   const char *name, int namelen);
 int ll_get_fid_by_name(struct inode *parent, const char *name,
diff --git a/drivers/staging/lustre/lustre/llite/namei.c 
b/drivers/staging/lustre/lustre/llite/namei.c
index 6c9ec46..d7c4c58 100644
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -1190,7 +1190,10 @@ static int

[PATCH 19/22] staging: lustre: llite: add support set_acl method in inode operations

2018-04-15 Thread James Simmons

From: Dmitry Eremin 

Linux kernel v3.14 adds set_acl method to inode operations.
This patch adds support to Lustre for proper acl management.

Signed-off-by: Dmitry Eremin 
Signed-off-by: John L. Hammond 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183
Reviewed-on: https://review.whamcloud.com/25965
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10541
Reviewed-on: https://review.whamcloud.com/
Reviewed-by: Bob Glossman 
Reviewed-by: James Simmons 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/file.c | 67 ++
 .../staging/lustre/lustre/llite/llite_internal.h   |  4 ++
 drivers/staging/lustre/lustre/llite/namei.c| 10 +++-
 3 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/file.c 
b/drivers/staging/lustre/lustre/llite/file.c
index 0026fde..35f5bda 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -3030,6 +3030,7 @@ static int ll_fiemap(struct inode *inode, struct 
fiemap_extent_info *fieinfo,
return rc;
 }
 
+#ifdef CONFIG_FS_POSIX_ACL
 struct posix_acl *ll_get_acl(struct inode *inode, int type)
 {
struct ll_inode_info *lli = ll_i2info(inode);
@@ -3043,6 +3044,69 @@ struct posix_acl *ll_get_acl(struct inode *inode, int 
type)
return acl;
 }
 
+int ll_set_acl(struct inode *inode, struct posix_acl *acl, int type)
+{
+   struct ll_sb_info *sbi = ll_i2sbi(inode);
+   struct ptlrpc_request *req = NULL;
+   const char *name = NULL;
+   size_t value_size = 0;
+   char *value = NULL;
+   int rc;
+
+   switch (type) {
+   case ACL_TYPE_ACCESS:
+   name = XATTR_NAME_POSIX_ACL_ACCESS;
+   if (acl) {
+   rc = posix_acl_update_mode(inode, >i_mode, );
+   if (rc)
+   goto out;
+   }
+
+   break;
+
+   case ACL_TYPE_DEFAULT:
+   name = XATTR_NAME_POSIX_ACL_DEFAULT;
+   if (!S_ISDIR(inode->i_mode)) {
+   rc = acl ? -EACCES : 0;
+   goto out;
+   }
+
+   break;
+
+   default:
+   rc = -EINVAL;
+   goto out;
+   }
+
+   if (acl) {
+   value_size = posix_acl_xattr_size(acl->a_count);
+   value = kmalloc(value_size, GFP_NOFS);
+   if (!value) {
+   rc = -ENOMEM;
+   goto out;
+   }
+
+   rc = posix_acl_to_xattr(_user_ns, acl, value, value_size);
+   if (rc < 0)
+   goto out_value;
+   }
+
+   rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode),
+value ? OBD_MD_FLXATTR : OBD_MD_FLXATTRRM,
+name, value, value_size, 0, 0, 0, );
+
+   ptlrpc_req_finished(req);
+out_value:
+   kfree(value);
+out:
+   if (!rc)
+   set_cached_acl(inode, type, acl);
+   else
+   forget_cached_acl(inode, type);
+   return rc;
+}
+#endif /* CONFIG_FS_POSIX_ACL */
+
 int ll_inode_permission(struct inode *inode, int mask)
 {
struct ll_sb_info *sbi;
@@ -3164,7 +3228,10 @@ int ll_inode_permission(struct inode *inode, int mask)
.permission = ll_inode_permission,
.listxattr  = ll_listxattr,
.fiemap = ll_fiemap,
+#ifdef CONFIG_FS_POSIX_ACL
.get_acl= ll_get_acl,
+   .set_acl= ll_set_acl,
+#endif
 };
 
 /* dynamic ioctl number support routines */
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h 
b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 6504850..2280327 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -754,7 +754,11 @@ enum ldlm_mode ll_take_md_lock(struct inode *inode, __u64 
bits,
 int ll_md_real_close(struct inode *inode, fmode_t fmode);
 int ll_getattr(const struct path *path, struct kstat *stat,
   u32 request_mask, unsigned int flags);
+#ifdef CONFIG_FS_POSIX_ACL
 struct posix_acl *ll_get_acl(struct inode *inode, int type);
+int ll_set_acl(struct inode *inode, struct posix_acl *acl, int type);
+#endif /* CONFIG_FS_POSIX_ACL */
+
 int ll_migrate(struct inode *parent, struct file *file, int mdtidx,
   const char *name, int namelen);
 int ll_get_fid_by_name(struct inode *parent, const char *name,
diff --git a/drivers/staging/lustre/lustre/llite/namei.c 
b/drivers/staging/lustre/lustre/llite/namei.c
index 6c9ec46..d7c4c58 100644
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -1190,7 +1190,10 @@ static int ll_rename(struct inode *src, struct dentry 
*src_dchild,
.getattr= ll_getattr,
.permission  = ll_inode_permission,
.listxattr

[PATCH 20/22] staging: lustre: llite: use xattr_handler name for ACLs

2018-04-15 Thread James Simmons

From: "John L. Hammond" 

If struct xattr_handler has a name member then use it (rather than
prefix) for the ACL xattrs. This avoids a bug where ACL operations
failed for some kernels.

Signed-off-by: John L. Hammond 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10785
Reviewed-on: https://review.whamcloud.com/
Reviewed-by: Dmitry Eremin 
Reviewed-by: James Simmons 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index d08bf1e..e835c8e 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -46,15 +46,16 @@
 
 const struct xattr_handler *get_xattr_type(const char *name)
 {
-   int i = 0;
+   int i;
 
-   while (ll_xattr_handlers[i]) {
-   size_t len = strlen(ll_xattr_handlers[i]->prefix);
+   for (i = 0; ll_xattr_handlers[i]; i++) {
+   const char *prefix = xattr_prefix(ll_xattr_handlers[i]);
+   size_t prefix_len = strlen(prefix);
 
-   if (!strncmp(ll_xattr_handlers[i]->prefix, name, len))
+   if (!strncmp(prefix, name, prefix_len))
return ll_xattr_handlers[i];
-   i++;
}
+
return NULL;
 }
 
@@ -627,14 +628,14 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, 
size_t size)
 };
 
 static const struct xattr_handler ll_acl_access_xattr_handler = {
-   .prefix = XATTR_NAME_POSIX_ACL_ACCESS,
+   .name = XATTR_NAME_POSIX_ACL_ACCESS,
.flags = XATTR_ACL_ACCESS_T,
.get = ll_xattr_get_common,
.set = ll_xattr_set_common,
 };
 
 static const struct xattr_handler ll_acl_default_xattr_handler = {
-   .prefix = XATTR_NAME_POSIX_ACL_DEFAULT,
+   .name = XATTR_NAME_POSIX_ACL_DEFAULT,
.flags = XATTR_ACL_DEFAULT_T,
.get = ll_xattr_get_common,
.set = ll_xattr_set_common,
-- 
1.8.3.1

[PATCH 20/22] staging: lustre: llite: use xattr_handler name for ACLs

2018-04-15 Thread James Simmons

From: "John L. Hammond" 

If struct xattr_handler has a name member then use it (rather than
prefix) for the ACL xattrs. This avoids a bug where ACL operations
failed for some kernels.

Signed-off-by: John L. Hammond 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10785
Reviewed-on: https://review.whamcloud.com/
Reviewed-by: Dmitry Eremin 
Reviewed-by: James Simmons 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index d08bf1e..e835c8e 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -46,15 +46,16 @@
 
 const struct xattr_handler *get_xattr_type(const char *name)
 {
-   int i = 0;
+   int i;
 
-   while (ll_xattr_handlers[i]) {
-   size_t len = strlen(ll_xattr_handlers[i]->prefix);
+   for (i = 0; ll_xattr_handlers[i]; i++) {
+   const char *prefix = xattr_prefix(ll_xattr_handlers[i]);
+   size_t prefix_len = strlen(prefix);
 
-   if (!strncmp(ll_xattr_handlers[i]->prefix, name, len))
+   if (!strncmp(prefix, name, prefix_len))
return ll_xattr_handlers[i];
-   i++;
}
+
return NULL;
 }
 
@@ -627,14 +628,14 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, 
size_t size)
 };
 
 static const struct xattr_handler ll_acl_access_xattr_handler = {
-   .prefix = XATTR_NAME_POSIX_ACL_ACCESS,
+   .name = XATTR_NAME_POSIX_ACL_ACCESS,
.flags = XATTR_ACL_ACCESS_T,
.get = ll_xattr_get_common,
.set = ll_xattr_set_common,
 };
 
 static const struct xattr_handler ll_acl_default_xattr_handler = {
-   .prefix = XATTR_NAME_POSIX_ACL_DEFAULT,
+   .name = XATTR_NAME_POSIX_ACL_DEFAULT,
.flags = XATTR_ACL_DEFAULT_T,
.get = ll_xattr_get_common,
.set = ll_xattr_set_common,
-- 
1.8.3.1

[PATCH 18/22] staging: lustre: llite: style changes in xattr.c

2018-04-15 Thread James Simmons

Small style changes to match more the kernel code standard
and it make it more readable.

Signed-off-by: James Simmons 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183
Reviewed-on: https://review.whamcloud.com/27240
Reviewed-by: Dmitry Eremin 
Reviewed-by: Bob Glossman 
Reviewed-by: Sebastien Buisson 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index 835d00f..d08bf1e 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -81,11 +81,10 @@ static int xattr_type_filter(struct ll_sb_info *sbi,
return 0;
 }
 
-static int
-ll_xattr_set_common(const struct xattr_handler *handler,
-   struct dentry *dentry, struct inode *inode,
-   const char *name, const void *value, size_t size,
-   int flags)
+static int ll_xattr_set_common(const struct xattr_handler *handler,
+  struct dentry *dentry, struct inode *inode,
+  const char *name, const void *value, size_t size,
+  int flags)
 {
struct ll_sb_info *sbi = ll_i2sbi(inode);
struct ptlrpc_request *req = NULL;
@@ -139,9 +138,9 @@ static int xattr_type_filter(struct ll_sb_info *sbi,
fullname = kasprintf(GFP_KERNEL, "%s%s", handler->prefix, name);
if (!fullname)
return -ENOMEM;
-   rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode),
-valid, fullname, pv, size, 0, flags,
-ll_i2suppgid(inode), );
+
+   rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode), valid, fullname,
+pv, size, 0, flags, ll_i2suppgid(inode), );
kfree(fullname);
if (rc) {
if (rc == -EOPNOTSUPP && handler->flags == XATTR_USER_T) {
@@ -307,9 +306,8 @@ static int ll_xattr_set(const struct xattr_handler *handler,
   flags);
 }
 
-int
-ll_xattr_list(struct inode *inode, const char *name, int type, void *buffer,
- size_t size, u64 valid)
+int ll_xattr_list(struct inode *inode, const char *name, int type, void 
*buffer,
+ size_t size, u64 valid)
 {
struct ll_inode_info *lli = ll_i2info(inode);
struct ll_sb_info *sbi = ll_i2sbi(inode);
@@ -439,6 +437,7 @@ static int ll_xattr_get_common(const struct xattr_handler 
*handler,
fullname = kasprintf(GFP_KERNEL, "%s%s", handler->prefix, name);
if (!fullname)
return -ENOMEM;
+
rc = ll_xattr_list(inode, fullname, handler->flags, buffer, size,
   OBD_MD_FLXATTR);
kfree(fullname);
@@ -562,6 +561,7 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, 
size_t size)
   OBD_MD_FLXATTRLS);
if (rc < 0)
return rc;
+
/*
 * If we're being called to get the size of the xattr list
 * (size == 0) then just assume that a lustre.lov xattr
-- 
1.8.3.1

[PATCH 16/22] staging: lustre: llite: use proper types in the xattr code

2018-04-15 Thread James Simmons

Convert __uXX types to uXX types since this is kernel code.
The function ll_lov_user_md_size() returns ssize_t so change
lum_size from int to ssize_t.

Signed-off-by: James Simmons 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183
Reviewed-on: https://review.whamcloud.com/27240
Reviewed-by: Dmitry Eremin 
Reviewed-by: Bob Glossman 
Reviewed-by: Sebastien Buisson 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index 147ffcc..d6cee3b 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -91,7 +91,7 @@ static int xattr_type_filter(struct ll_sb_info *sbi,
struct ptlrpc_request *req = NULL;
const char *pv = value;
char *fullname;
-   __u64 valid;
+   u64 valid;
int rc;
 
if (flags == XATTR_REPLACE) {
@@ -246,8 +246,8 @@ static int ll_setstripe_ea(struct dentry *dentry, struct 
lov_user_md *lump,
return rc;
 
if (lump && S_ISREG(inode->i_mode)) {
-   __u64 it_flags = FMODE_WRITE;
-   int lum_size;
+   u64 it_flags = FMODE_WRITE;
+   ssize_t lum_size;
 
lum_size = ll_lov_user_md_size(lump);
if (lum_size < 0 || size < lum_size)
@@ -309,7 +309,7 @@ static int ll_xattr_set(const struct xattr_handler *handler,
 
 int
 ll_xattr_list(struct inode *inode, const char *name, int type, void *buffer,
- size_t size, __u64 valid)
+ size_t size, u64 valid)
 {
struct ll_inode_info *lli = ll_i2info(inode);
struct ll_sb_info *sbi = ll_i2sbi(inode);
-- 
1.8.3.1

[PATCH 18/22] staging: lustre: llite: style changes in xattr.c

2018-04-15 Thread James Simmons

Small style changes to match more the kernel code standard
and it make it more readable.

Signed-off-by: James Simmons 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183
Reviewed-on: https://review.whamcloud.com/27240
Reviewed-by: Dmitry Eremin 
Reviewed-by: Bob Glossman 
Reviewed-by: Sebastien Buisson 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index 835d00f..d08bf1e 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -81,11 +81,10 @@ static int xattr_type_filter(struct ll_sb_info *sbi,
return 0;
 }
 
-static int
-ll_xattr_set_common(const struct xattr_handler *handler,
-   struct dentry *dentry, struct inode *inode,
-   const char *name, const void *value, size_t size,
-   int flags)
+static int ll_xattr_set_common(const struct xattr_handler *handler,
+  struct dentry *dentry, struct inode *inode,
+  const char *name, const void *value, size_t size,
+  int flags)
 {
struct ll_sb_info *sbi = ll_i2sbi(inode);
struct ptlrpc_request *req = NULL;
@@ -139,9 +138,9 @@ static int xattr_type_filter(struct ll_sb_info *sbi,
fullname = kasprintf(GFP_KERNEL, "%s%s", handler->prefix, name);
if (!fullname)
return -ENOMEM;
-   rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode),
-valid, fullname, pv, size, 0, flags,
-ll_i2suppgid(inode), );
+
+   rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode), valid, fullname,
+pv, size, 0, flags, ll_i2suppgid(inode), );
kfree(fullname);
if (rc) {
if (rc == -EOPNOTSUPP && handler->flags == XATTR_USER_T) {
@@ -307,9 +306,8 @@ static int ll_xattr_set(const struct xattr_handler *handler,
   flags);
 }
 
-int
-ll_xattr_list(struct inode *inode, const char *name, int type, void *buffer,
- size_t size, u64 valid)
+int ll_xattr_list(struct inode *inode, const char *name, int type, void 
*buffer,
+ size_t size, u64 valid)
 {
struct ll_inode_info *lli = ll_i2info(inode);
struct ll_sb_info *sbi = ll_i2sbi(inode);
@@ -439,6 +437,7 @@ static int ll_xattr_get_common(const struct xattr_handler 
*handler,
fullname = kasprintf(GFP_KERNEL, "%s%s", handler->prefix, name);
if (!fullname)
return -ENOMEM;
+
rc = ll_xattr_list(inode, fullname, handler->flags, buffer, size,
   OBD_MD_FLXATTR);
kfree(fullname);
@@ -562,6 +561,7 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, 
size_t size)
   OBD_MD_FLXATTRLS);
if (rc < 0)
return rc;
+
/*
 * If we're being called to get the size of the xattr list
 * (size == 0) then just assume that a lustre.lov xattr
-- 
1.8.3.1

[PATCH 16/22] staging: lustre: llite: use proper types in the xattr code

2018-04-15 Thread James Simmons

Convert __uXX types to uXX types since this is kernel code.
The function ll_lov_user_md_size() returns ssize_t so change
lum_size from int to ssize_t.

Signed-off-by: James Simmons 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183
Reviewed-on: https://review.whamcloud.com/27240
Reviewed-by: Dmitry Eremin 
Reviewed-by: Bob Glossman 
Reviewed-by: Sebastien Buisson 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index 147ffcc..d6cee3b 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -91,7 +91,7 @@ static int xattr_type_filter(struct ll_sb_info *sbi,
struct ptlrpc_request *req = NULL;
const char *pv = value;
char *fullname;
-   __u64 valid;
+   u64 valid;
int rc;
 
if (flags == XATTR_REPLACE) {
@@ -246,8 +246,8 @@ static int ll_setstripe_ea(struct dentry *dentry, struct 
lov_user_md *lump,
return rc;
 
if (lump && S_ISREG(inode->i_mode)) {
-   __u64 it_flags = FMODE_WRITE;
-   int lum_size;
+   u64 it_flags = FMODE_WRITE;
+   ssize_t lum_size;
 
lum_size = ll_lov_user_md_size(lump);
if (lum_size < 0 || size < lum_size)
@@ -309,7 +309,7 @@ static int ll_xattr_set(const struct xattr_handler *handler,
 
 int
 ll_xattr_list(struct inode *inode, const char *name, int type, void *buffer,
- size_t size, __u64 valid)
+ size_t size, u64 valid)
 {
struct ll_inode_info *lli = ll_i2info(inode);
struct ll_sb_info *sbi = ll_i2sbi(inode);
-- 
1.8.3.1

[PATCH 17/22] staging: lustre: llite: cleanup xattr code comments

2018-04-15 Thread James Simmons

Add proper punctuation to the comments. Change buf_size to size
for comment in ll_listxattr() since buf_size doesn't exit which
will confuse someone reading the code.

Signed-off-by: James Simmons 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183
Reviewed-on: https://review.whamcloud.com/27240
Reviewed-by: Dmitry Eremin 
Reviewed-by: Bob Glossman 
Reviewed-by: Sebastien Buisson 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index d6cee3b..835d00f 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -564,7 +564,7 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, 
size_t size)
return rc;
/*
 * If we're being called to get the size of the xattr list
-* (buf_size == 0) then just assume that a lustre.lov xattr
+* (size == 0) then just assume that a lustre.lov xattr
 * exists.
 */
if (!size)
@@ -577,14 +577,14 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, 
size_t size)
len = strnlen(xattr_name, rem - 1) + 1;
rem -= len;
if (!xattr_type_filter(sbi, get_xattr_type(xattr_name))) {
-   /* Skip OK xattr type leave it in buffer */
+   /* Skip OK xattr type, leave it in buffer. */
xattr_name += len;
continue;
}
 
/*
 * Move up remaining xattrs in buffer
-* removing the xattr that is not OK
+* removing the xattr that is not OK.
 */
memmove(xattr_name, xattr_name + len, rem);
rc -= len;
-- 
1.8.3.1

[PATCH 17/22] staging: lustre: llite: cleanup xattr code comments

2018-04-15 Thread James Simmons

Add proper punctuation to the comments. Change buf_size to size
for comment in ll_listxattr() since buf_size doesn't exit which
will confuse someone reading the code.

Signed-off-by: James Simmons 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183
Reviewed-on: https://review.whamcloud.com/27240
Reviewed-by: Dmitry Eremin 
Reviewed-by: Bob Glossman 
Reviewed-by: Sebastien Buisson 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index d6cee3b..835d00f 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -564,7 +564,7 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, 
size_t size)
return rc;
/*
 * If we're being called to get the size of the xattr list
-* (buf_size == 0) then just assume that a lustre.lov xattr
+* (size == 0) then just assume that a lustre.lov xattr
 * exists.
 */
if (!size)
@@ -577,14 +577,14 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, 
size_t size)
len = strnlen(xattr_name, rem - 1) + 1;
rem -= len;
if (!xattr_type_filter(sbi, get_xattr_type(xattr_name))) {
-   /* Skip OK xattr type leave it in buffer */
+   /* Skip OK xattr type, leave it in buffer. */
xattr_name += len;
continue;
}
 
/*
 * Move up remaining xattrs in buffer
-* removing the xattr that is not OK
+* removing the xattr that is not OK.
 */
memmove(xattr_name, xattr_name + len, rem);
rc -= len;
-- 
1.8.3.1

[PATCH 21/22] staging: lustre: llite: correct removexattr detection

2018-04-15 Thread James Simmons

In ll_xattr_set_common() detect the removexattr() case correctly by
testing for a NULL value as well as XATTR_REPLACE.

Signed-off-by: John L. Hammond 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10787
Reviewed-on: https://review.whamcloud.com/
Reviewed-by: Dmitry Eremin 
Reviewed-by: James Simmons 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index e835c8e..1a597a6 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -94,7 +94,11 @@ static int ll_xattr_set_common(const struct xattr_handler 
*handler,
u64 valid;
int rc;
 
-   if (flags == XATTR_REPLACE) {
+   /* When setxattr() is called with a size of 0 the value is
+* unconditionally replaced by "". When removexattr() is
+* called we get a NULL value and XATTR_REPLACE for flags.
+*/
+   if (!value && flags == XATTR_REPLACE) {
ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_REMOVEXATTR, 1);
valid = OBD_MD_FLXATTRRM;
} else {
-- 
1.8.3.1

[PATCH 21/22] staging: lustre: llite: correct removexattr detection

2018-04-15 Thread James Simmons

In ll_xattr_set_common() detect the removexattr() case correctly by
testing for a NULL value as well as XATTR_REPLACE.

Signed-off-by: John L. Hammond 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10787
Reviewed-on: https://review.whamcloud.com/
Reviewed-by: Dmitry Eremin 
Reviewed-by: James Simmons 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index e835c8e..1a597a6 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -94,7 +94,11 @@ static int ll_xattr_set_common(const struct xattr_handler 
*handler,
u64 valid;
int rc;
 
-   if (flags == XATTR_REPLACE) {
+   /* When setxattr() is called with a size of 0 the value is
+* unconditionally replaced by "". When removexattr() is
+* called we get a NULL value and XATTR_REPLACE for flags.
+*/
+   if (!value && flags == XATTR_REPLACE) {
ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_REMOVEXATTR, 1);
valid = OBD_MD_FLXATTRRM;
} else {
-- 
1.8.3.1

[PATCH 22/22] staging: lustre: llite: remove unused parameters from md_{get,set}xattr()

2018-04-15 Thread James Simmons

From: "John L. Hammond" 

md_getxattr() and md_setxattr() each have several unused
parameters. Remove them and improve the naming or remaining
parameters.

Signed-off-by: John L. Hammond 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10792
Reviewed-on: https://review.whamcloud.com/
Reviewed-by: Dmitry Eremin 
Reviewed-by: James Simmons 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/include/obd.h   |  7 ++---
 drivers/staging/lustre/lustre/include/obd_class.h | 21 ++
 drivers/staging/lustre/lustre/llite/file.c|  5 ++--
 drivers/staging/lustre/lustre/llite/xattr.c   |  6 ++--
 drivers/staging/lustre/lustre/lmv/lmv_obd.c   | 22 +++
 drivers/staging/lustre/lustre/mdc/mdc_request.c   | 34 +--
 6 files changed, 46 insertions(+), 49 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/obd.h 
b/drivers/staging/lustre/lustre/include/obd.h
index 48cf7ab..0f9e5dc 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -935,12 +935,11 @@ struct md_ops {
  struct ptlrpc_request **);
 
int (*setxattr)(struct obd_export *, const struct lu_fid *,
-   u64, const char *, const char *, int, int, int, __u32,
-   struct ptlrpc_request **);
+   u64, const char *, const void *, size_t, unsigned int,
+   u32, struct ptlrpc_request **);
 
int (*getxattr)(struct obd_export *, const struct lu_fid *,
-   u64, const char *, const char *, int, int, int,
-   struct ptlrpc_request **);
+   u64, const char *, size_t, struct ptlrpc_request **);
 
int (*init_ea_size)(struct obd_export *, u32, u32);
 
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h 
b/drivers/staging/lustre/lustre/include/obd_class.h
index a76f016..0081578 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -1385,29 +1385,26 @@ static inline int md_merge_attr(struct obd_export *exp,
 }
 
 static inline int md_setxattr(struct obd_export *exp, const struct lu_fid *fid,
- u64 valid, const char *name,
- const char *input, int input_size,
- int output_size, int flags, __u32 suppgid,
+ u64 obd_md_valid, const char *name,
+ const char *value, size_t value_size,
+ unsigned int xattr_flags, u32 suppgid,
  struct ptlrpc_request **request)
 {
EXP_CHECK_MD_OP(exp, setxattr);
EXP_MD_COUNTER_INCREMENT(exp, setxattr);
-   return MDP(exp->exp_obd, setxattr)(exp, fid, valid, name, input,
-  input_size, output_size, flags,
+   return MDP(exp->exp_obd, setxattr)(exp, fid, obd_md_valid, name,
+  value, value_size, xattr_flags,
   suppgid, request);
 }
 
 static inline int md_getxattr(struct obd_export *exp, const struct lu_fid *fid,
- u64 valid, const char *name,
- const char *input, int input_size,
- int output_size, int flags,
- struct ptlrpc_request **request)
+ u64 obd_md_valid, const char *name,
+ size_t buf_size, struct ptlrpc_request **req)
 {
EXP_CHECK_MD_OP(exp, getxattr);
EXP_MD_COUNTER_INCREMENT(exp, getxattr);
-   return MDP(exp->exp_obd, getxattr)(exp, fid, valid, name, input,
-  input_size, output_size, flags,
-  request);
+   return MDP(exp->exp_obd, getxattr)(exp, fid, obd_md_valid, name,
+  buf_size, req);
 }
 
 static inline int md_set_open_replay_data(struct obd_export *exp,
diff --git a/drivers/staging/lustre/lustre/llite/file.c 
b/drivers/staging/lustre/lustre/llite/file.c
index 35f5bda..9197891 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -3093,7 +3093,7 @@ int ll_set_acl(struct inode *inode, struct posix_acl 
*acl, int type)
 
rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode),
 value ? OBD_MD_FLXATTR : OBD_MD_FLXATTRRM,
-name, value, value_size, 0, 0, 0, );
+name, value, value_size, 0, 0, );
 
ptlrpc_req_finished(req);
 out_value:
@@ -3405,8 +3405,7 @@ static int ll_layout_fetch(struct inode *inode, struct 
ldlm_lock *lock)
rc =

[PATCH 22/22] staging: lustre: llite: remove unused parameters from md_{get,set}xattr()

2018-04-15 Thread James Simmons

From: "John L. Hammond" 

md_getxattr() and md_setxattr() each have several unused
parameters. Remove them and improve the naming or remaining
parameters.

Signed-off-by: John L. Hammond 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10792
Reviewed-on: https://review.whamcloud.com/
Reviewed-by: Dmitry Eremin 
Reviewed-by: James Simmons 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/include/obd.h   |  7 ++---
 drivers/staging/lustre/lustre/include/obd_class.h | 21 ++
 drivers/staging/lustre/lustre/llite/file.c|  5 ++--
 drivers/staging/lustre/lustre/llite/xattr.c   |  6 ++--
 drivers/staging/lustre/lustre/lmv/lmv_obd.c   | 22 +++
 drivers/staging/lustre/lustre/mdc/mdc_request.c   | 34 +--
 6 files changed, 46 insertions(+), 49 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/obd.h 
b/drivers/staging/lustre/lustre/include/obd.h
index 48cf7ab..0f9e5dc 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -935,12 +935,11 @@ struct md_ops {
  struct ptlrpc_request **);
 
int (*setxattr)(struct obd_export *, const struct lu_fid *,
-   u64, const char *, const char *, int, int, int, __u32,
-   struct ptlrpc_request **);
+   u64, const char *, const void *, size_t, unsigned int,
+   u32, struct ptlrpc_request **);
 
int (*getxattr)(struct obd_export *, const struct lu_fid *,
-   u64, const char *, const char *, int, int, int,
-   struct ptlrpc_request **);
+   u64, const char *, size_t, struct ptlrpc_request **);
 
int (*init_ea_size)(struct obd_export *, u32, u32);
 
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h 
b/drivers/staging/lustre/lustre/include/obd_class.h
index a76f016..0081578 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -1385,29 +1385,26 @@ static inline int md_merge_attr(struct obd_export *exp,
 }
 
 static inline int md_setxattr(struct obd_export *exp, const struct lu_fid *fid,
- u64 valid, const char *name,
- const char *input, int input_size,
- int output_size, int flags, __u32 suppgid,
+ u64 obd_md_valid, const char *name,
+ const char *value, size_t value_size,
+ unsigned int xattr_flags, u32 suppgid,
  struct ptlrpc_request **request)
 {
EXP_CHECK_MD_OP(exp, setxattr);
EXP_MD_COUNTER_INCREMENT(exp, setxattr);
-   return MDP(exp->exp_obd, setxattr)(exp, fid, valid, name, input,
-  input_size, output_size, flags,
+   return MDP(exp->exp_obd, setxattr)(exp, fid, obd_md_valid, name,
+  value, value_size, xattr_flags,
   suppgid, request);
 }
 
 static inline int md_getxattr(struct obd_export *exp, const struct lu_fid *fid,
- u64 valid, const char *name,
- const char *input, int input_size,
- int output_size, int flags,
- struct ptlrpc_request **request)
+ u64 obd_md_valid, const char *name,
+ size_t buf_size, struct ptlrpc_request **req)
 {
EXP_CHECK_MD_OP(exp, getxattr);
EXP_MD_COUNTER_INCREMENT(exp, getxattr);
-   return MDP(exp->exp_obd, getxattr)(exp, fid, valid, name, input,
-  input_size, output_size, flags,
-  request);
+   return MDP(exp->exp_obd, getxattr)(exp, fid, obd_md_valid, name,
+  buf_size, req);
 }
 
 static inline int md_set_open_replay_data(struct obd_export *exp,
diff --git a/drivers/staging/lustre/lustre/llite/file.c 
b/drivers/staging/lustre/lustre/llite/file.c
index 35f5bda..9197891 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -3093,7 +3093,7 @@ int ll_set_acl(struct inode *inode, struct posix_acl 
*acl, int type)
 
rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode),
 value ? OBD_MD_FLXATTR : OBD_MD_FLXATTRRM,
-name, value, value_size, 0, 0, 0, );
+name, value, value_size, 0, 0, );
 
ptlrpc_req_finished(req);
 out_value:
@@ -3405,8 +3405,7 @@ static int ll_layout_fetch(struct inode *inode, struct 
ldlm_lock *lock)
rc = ll_get_default_mdsize(sbi, );
if (rc == 0)
rc = md_getxattr(sbi->ll_md_exp, ll_inode2fid(inode),
-

[PATCH 15/22] staging: lustre: llite: cleanup posix acl xattr code

2018-04-15 Thread James Simmons

Having an extra ifdef makes the code harder to read. For the case
of ll_xattr_get_common() we have a variable initialized at the
start of the function but it is only used in XATTR_ACL_ACCESS_T
code block. Lets move that variable to that location since its
only used there and make the code look cleaner.

Signed-off-by: James Simmons 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183
Reviewed-on: https://review.whamcloud.com/27240
Reviewed-by: Dmitry Eremin 
Reviewed-by: Bob Glossman 
Reviewed-by: Sebastien Buisson 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index 3ab7ae0..147ffcc 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -396,9 +396,6 @@ static int ll_xattr_get_common(const struct xattr_handler 
*handler,
   const char *name, void *buffer, size_t size)
 {
struct ll_sb_info *sbi = ll_i2sbi(inode);
-#ifdef CONFIG_FS_POSIX_ACL
-   struct ll_inode_info *lli = ll_i2info(inode);
-#endif
char *fullname;
int rc;
 
@@ -422,6 +419,7 @@ static int ll_xattr_get_common(const struct xattr_handler 
*handler,
 * chance that cached ACL is uptodate.
 */
if (handler->flags == XATTR_ACL_ACCESS_T) {
+   struct ll_inode_info *lli = ll_i2info(inode);
struct posix_acl *acl;
 
spin_lock(>lli_lock);
-- 
1.8.3.1

[PATCH 15/22] staging: lustre: llite: cleanup posix acl xattr code

2018-04-15 Thread James Simmons

Having an extra ifdef makes the code harder to read. For the case
of ll_xattr_get_common() we have a variable initialized at the
start of the function but it is only used in XATTR_ACL_ACCESS_T
code block. Lets move that variable to that location since its
only used there and make the code look cleaner.

Signed-off-by: James Simmons 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183
Reviewed-on: https://review.whamcloud.com/27240
Reviewed-by: Dmitry Eremin 
Reviewed-by: Bob Glossman 
Reviewed-by: Sebastien Buisson 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lustre/llite/xattr.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/xattr.c 
b/drivers/staging/lustre/lustre/llite/xattr.c
index 3ab7ae0..147ffcc 100644
--- a/drivers/staging/lustre/lustre/llite/xattr.c
+++ b/drivers/staging/lustre/lustre/llite/xattr.c
@@ -396,9 +396,6 @@ static int ll_xattr_get_common(const struct xattr_handler 
*handler,
   const char *name, void *buffer, size_t size)
 {
struct ll_sb_info *sbi = ll_i2sbi(inode);
-#ifdef CONFIG_FS_POSIX_ACL
-   struct ll_inode_info *lli = ll_i2info(inode);
-#endif
char *fullname;
int rc;
 
@@ -422,6 +419,7 @@ static int ll_xattr_get_common(const struct xattr_handler 
*handler,
 * chance that cached ACL is uptodate.
 */
if (handler->flags == XATTR_ACL_ACCESS_T) {
+   struct ll_inode_info *lli = ll_i2info(inode);
struct posix_acl *acl;
 
spin_lock(>lli_lock);
-- 
1.8.3.1

[PATCH 01/25] staging: lustre: libcfs: remove useless CPU partition code

2018-04-15 Thread James Simmons

From: Dmitry Eremin 

* remove scratch buffer and mutex which guard it.
* remove global cpumask and spinlock which guard it.
* remove cpt_version for checking CPUs state change during setup
  because of just disable CPUs state change during setup.
* remove whole global struct cfs_cpt_data cpt_data.
* remove few unused APIs.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23303
Reviewed-on: https://review.whamcloud.com/25048
Reviewed-by: James Simmons 
Reviewed-by: Doug Oucharek 
Reviewed-by: Andreas Dilger 
Reviewed-by: Olaf Weber 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 .../lustre/include/linux/libcfs/libcfs_cpu.h   |  13 +--
 .../lustre/include/linux/libcfs/linux/linux-cpu.h  |   2 -
 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c|  18 +---
 .../staging/lustre/lnet/libcfs/linux/linux-cpu.c   | 114 +++--
 4 files changed, 20 insertions(+), 127 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 61bce77..1f2cd78 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -162,12 +162,12 @@ struct cfs_cpt_table {
  * return 1 if successfully set all CPUs, otherwise return 0
  */
 int cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab,
-   int cpt, cpumask_t *mask);
+   int cpt, const cpumask_t *mask);
 /**
  * remove all cpus in \a mask from CPU partition \a cpt
  */
 void cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab,
-  int cpt, cpumask_t *mask);
+  int cpt, const cpumask_t *mask);
 /**
  * add all cpus in NUMA node \a node to CPU partition \a cpt
  * return 1 if successfully set all CPUs, otherwise return 0
@@ -190,20 +190,11 @@ int cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab,
 void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab,
int cpt, nodemask_t *mask);
 /**
- * unset all cpus for CPU partition \a cpt
- */
-void cfs_cpt_clear(struct cfs_cpt_table *cptab, int cpt);
-/**
  * convert partition id \a cpt to numa node id, if there are more than one
  * nodes in this partition, it might return a different node id each time.
  */
 int cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt);
 
-/**
- * return number of HTs in the same core of \a cpu
- */
-int cfs_cpu_ht_nsiblings(int cpu);
-
 /*
  * allocate per-cpu-partition data, returned value is an array of pointers,
  * variable can be indexed by CPU ID.
diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h
index 6035376..e8bbbaa 100644
--- a/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h
@@ -58,8 +58,6 @@ struct cfs_cpu_partition {
 
 /** descriptor for CPU partitions */
 struct cfs_cpt_table {
-   /* version, reserved for hotplug */
-   unsigned intctb_version;
/* spread rotor for NUMA allocator */
unsigned intctb_spread_rotor;
/* # of CPU partitions */
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 76291a3..705abf2 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -129,14 +129,15 @@ struct cfs_cpt_table *
 EXPORT_SYMBOL(cfs_cpt_unset_cpu);
 
 int
-cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
+cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt, const cpumask_t 
*mask)
 {
return 1;
 }
 EXPORT_SYMBOL(cfs_cpt_set_cpumask);
 
 void
-cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
+cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt,
+ const cpumask_t *mask)
 {
 }
 EXPORT_SYMBOL(cfs_cpt_unset_cpumask);
@@ -167,12 +168,6 @@ struct cfs_cpt_table *
 }
 EXPORT_SYMBOL(cfs_cpt_unset_nodemask);
 
-void
-cfs_cpt_clear(struct cfs_cpt_table *cptab, int cpt)
-{
-}
-EXPORT_SYMBOL(cfs_cpt_clear);
-
 int
 cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt)
 {
@@ -181,13 +176,6 @@ struct cfs_cpt_table *
 EXPORT_SYMBOL(cfs_cpt_spread_node);
 
 int
-cfs_cpu_ht_nsiblings(int cpu)
-{
-   return 1;
-}
-EXPORT_SYMBOL(cfs_cpu_ht_nsiblings);
-
-int
 cfs_cpt_current(struct cfs_cpt_table *cptab, int remap)
 {
return 0;
diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c 
b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
index 388521e..134b239 100644
---

[PATCH 01/25] staging: lustre: libcfs: remove useless CPU partition code

2018-04-15 Thread James Simmons

From: Dmitry Eremin 

* remove scratch buffer and mutex which guard it.
* remove global cpumask and spinlock which guard it.
* remove cpt_version for checking CPUs state change during setup
  because of just disable CPUs state change during setup.
* remove whole global struct cfs_cpt_data cpt_data.
* remove few unused APIs.

Signed-off-by: Dmitry Eremin 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703
Reviewed-on: https://review.whamcloud.com/23303
Reviewed-on: https://review.whamcloud.com/25048
Reviewed-by: James Simmons 
Reviewed-by: Doug Oucharek 
Reviewed-by: Andreas Dilger 
Reviewed-by: Olaf Weber 
Reviewed-by: Oleg Drokin 
Signed-off-by: James Simmons 
---
 .../lustre/include/linux/libcfs/libcfs_cpu.h   |  13 +--
 .../lustre/include/linux/libcfs/linux/linux-cpu.h  |   2 -
 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c|  18 +---
 .../staging/lustre/lnet/libcfs/linux/linux-cpu.c   | 114 +++--
 4 files changed, 20 insertions(+), 127 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 61bce77..1f2cd78 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -162,12 +162,12 @@ struct cfs_cpt_table {
  * return 1 if successfully set all CPUs, otherwise return 0
  */
 int cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab,
-   int cpt, cpumask_t *mask);
+   int cpt, const cpumask_t *mask);
 /**
  * remove all cpus in \a mask from CPU partition \a cpt
  */
 void cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab,
-  int cpt, cpumask_t *mask);
+  int cpt, const cpumask_t *mask);
 /**
  * add all cpus in NUMA node \a node to CPU partition \a cpt
  * return 1 if successfully set all CPUs, otherwise return 0
@@ -190,20 +190,11 @@ int cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab,
 void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab,
int cpt, nodemask_t *mask);
 /**
- * unset all cpus for CPU partition \a cpt
- */
-void cfs_cpt_clear(struct cfs_cpt_table *cptab, int cpt);
-/**
  * convert partition id \a cpt to numa node id, if there are more than one
  * nodes in this partition, it might return a different node id each time.
  */
 int cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt);
 
-/**
- * return number of HTs in the same core of \a cpu
- */
-int cfs_cpu_ht_nsiblings(int cpu);
-
 /*
  * allocate per-cpu-partition data, returned value is an array of pointers,
  * variable can be indexed by CPU ID.
diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h
index 6035376..e8bbbaa 100644
--- a/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h
@@ -58,8 +58,6 @@ struct cfs_cpu_partition {
 
 /** descriptor for CPU partitions */
 struct cfs_cpt_table {
-   /* version, reserved for hotplug */
-   unsigned intctb_version;
/* spread rotor for NUMA allocator */
unsigned intctb_spread_rotor;
/* # of CPU partitions */
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 76291a3..705abf2 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -129,14 +129,15 @@ struct cfs_cpt_table *
 EXPORT_SYMBOL(cfs_cpt_unset_cpu);
 
 int
-cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
+cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt, const cpumask_t 
*mask)
 {
return 1;
 }
 EXPORT_SYMBOL(cfs_cpt_set_cpumask);
 
 void
-cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
+cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt,
+ const cpumask_t *mask)
 {
 }
 EXPORT_SYMBOL(cfs_cpt_unset_cpumask);
@@ -167,12 +168,6 @@ struct cfs_cpt_table *
 }
 EXPORT_SYMBOL(cfs_cpt_unset_nodemask);
 
-void
-cfs_cpt_clear(struct cfs_cpt_table *cptab, int cpt)
-{
-}
-EXPORT_SYMBOL(cfs_cpt_clear);
-
 int
 cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt)
 {
@@ -181,13 +176,6 @@ struct cfs_cpt_table *
 EXPORT_SYMBOL(cfs_cpt_spread_node);
 
 int
-cfs_cpu_ht_nsiblings(int cpu)
-{
-   return 1;
-}
-EXPORT_SYMBOL(cfs_cpu_ht_nsiblings);
-
-int
 cfs_cpt_current(struct cfs_cpt_table *cptab, int remap)
 {
return 0;
diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c 
b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
index 388521e..134b239 100644
--- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
@@ -64,30 +64,6 @@
 module_param(cpu_pattern, charp, 0444);

[PATCH 03/25] staging: lustre: libcfs: implement cfs_cpt_cpumask for UMP case

2018-04-15 Thread James Simmons

From: Amir Shehata 

The function cfs_cpt_cpumask() exist for SMP systems but when
CONFIG_SMP is disabled it only returns NULL. Fill in this missing
function. Also properly initialize ctb_mask for the UMP
case.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h | 16 +---
 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c  |  9 +
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 1f2cd78..070f8fe 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -77,10 +77,6 @@
 
 #ifdef CONFIG_SMP
 /**
- * return cpumask of CPU partition \a cpt
- */
-cpumask_var_t *cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt);
-/**
  * print string information of cpt-table
  */
 int cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len);
@@ -89,19 +85,13 @@ struct cfs_cpt_table {
/* # of CPU partitions */
int ctb_nparts;
/* cpu mask */
-   cpumask_t   ctb_mask;
+   cpumask_var_t   ctb_mask;
/* node mask */
nodemask_t  ctb_nodemask;
/* version */
u64 ctb_version;
 };
 
-static inline cpumask_var_t *
-cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
-{
-   return NULL;
-}
-
 static inline int
 cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len)
 {
@@ -133,6 +123,10 @@ struct cfs_cpt_table {
  */
 int cfs_cpt_online(struct cfs_cpt_table *cptab, int cpt);
 /**
+ * return cpumask of CPU partition \a cpt
+ */
+cpumask_var_t *cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt);
+/**
  * return nodemask of CPU partition \a cpt
  */
 nodemask_t *cfs_cpt_nodemask(struct cfs_cpt_table *cptab, int cpt);
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 705abf2..5ea294f 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -54,6 +54,9 @@ struct cfs_cpt_table *
cptab = kzalloc(sizeof(*cptab), GFP_NOFS);
if (cptab) {
cptab->ctb_version = CFS_CPU_VERSION_MAGIC;
+   if (!zalloc_cpumask_var(>ctb_mask, GFP_NOFS))
+   return NULL;
+   cpumask_set_cpu(0, cptab->ctb_mask);
node_set(0, cptab->ctb_nodemask);
cptab->ctb_nparts  = ncpt;
}
@@ -108,6 +111,12 @@ struct cfs_cpt_table *
 }
 EXPORT_SYMBOL(cfs_cpt_online);
 
+cpumask_var_t *cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
+{
+   return >ctb_mask;
+}
+EXPORT_SYMBOL(cfs_cpt_cpumask);
+
 nodemask_t *
 cfs_cpt_nodemask(struct cfs_cpt_table *cptab, int cpt)
 {
-- 
1.8.3.1

[PATCH 06/25] staging: lustre: libcfs: replace num_possible_cpus() with nr_cpu_ids

2018-04-15 Thread James Simmons

From: Amir Shehata 

Move from num_possible_cpus() to nr_cpu_ids.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c 
b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
index b2a88ef..741db69 100644
--- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
@@ -105,14 +105,14 @@ struct cfs_cpt_table *
!cptab->ctb_nodemask)
goto failed;
 
-   cptab->ctb_cpu2cpt = kvmalloc_array(num_possible_cpus(),
+   cptab->ctb_cpu2cpt = kvmalloc_array(nr_cpu_ids,
sizeof(cptab->ctb_cpu2cpt[0]),
GFP_KERNEL);
if (!cptab->ctb_cpu2cpt)
goto failed;
 
memset(cptab->ctb_cpu2cpt, -1,
-  num_possible_cpus() * sizeof(cptab->ctb_cpu2cpt[0]));
+  nr_cpu_ids * sizeof(cptab->ctb_cpu2cpt[0]));
 
cptab->ctb_parts = kvmalloc_array(ncpt, sizeof(cptab->ctb_parts[0]),
  GFP_KERNEL);
-- 
1.8.3.1

[PATCH 08/25] staging: lustre: libcfs: add cpu distance handling

2018-04-15 Thread James Simmons

From: Amir Shehata 

Add functionality to calculate the distance between two CPTs.
Expose those distance in debugfs so people deploying a setup
can debug what is being created for CPTs.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
 .../lustre/include/linux/libcfs/libcfs_cpu.h   |  8 +++
 .../lustre/include/linux/libcfs/linux/linux-cpu.h  |  4 ++
 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c| 21 
 .../staging/lustre/lnet/libcfs/linux/linux-cpu.c   | 59 ++
 4 files changed, 92 insertions(+)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 839ec02..c0922fc 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -110,6 +110,10 @@ struct cfs_cpt_table {
  */
 struct cfs_cpt_table *cfs_cpt_table_alloc(unsigned int ncpt);
 /**
+ * print distance information of cpt-table
+ */
+int cfs_cpt_distance_print(struct cfs_cpt_table *cptab, char *buf, int len);
+/**
  * return total number of CPU partitions in \a cptab
  */
 int
@@ -143,6 +147,10 @@ struct cfs_cpt_table {
  */
 int cfs_cpt_of_node(struct cfs_cpt_table *cptab, int node);
 /**
+ * NUMA distance between \a cpt1 and \a cpt2 in \a cptab
+ */
+unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab, int cpt1, int cpt2);
+/**
  * bind current thread on a CPU-partition \a cpt of \a cptab
  */
 int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt);
diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h
index 1bed0ba..4ac1670 100644
--- a/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h
@@ -52,6 +52,8 @@ struct cfs_cpu_partition {
cpumask_var_t   cpt_cpumask;
/* nodes mask for this partition */
nodemask_t  *cpt_nodemask;
+   /* NUMA distance between CPTs */
+   unsigned int*cpt_distance;
/* spread rotor for NUMA allocator */
unsigned intcpt_spread_rotor;
 };
@@ -60,6 +62,8 @@ struct cfs_cpu_partition {
 struct cfs_cpt_table {
/* spread rotor for NUMA allocator */
unsigned intctb_spread_rotor;
+   /* maximum NUMA distance between all nodes in table */
+   unsigned intctb_distance;
/* # of CPU partitions */
unsigned intctb_nparts;
/* partitions tables */
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index e6d1512..7ac2796 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -41,6 +41,8 @@
 
 #define CFS_CPU_VERSION_MAGIC 0xbabecafe
 
+#define CFS_CPT_DISTANCE   1   /* Arbitrary positive value */
+
 struct cfs_cpt_table *
 cfs_cpt_table_alloc(unsigned int ncpt)
 {
@@ -90,6 +92,19 @@ struct cfs_cpt_table *
 EXPORT_SYMBOL(cfs_cpt_table_print);
 #endif /* CONFIG_SMP */
 
+int cfs_cpt_distance_print(struct cfs_cpt_table *cptab, char *buf, int len)
+{
+   int rc;
+
+   rc = snprintf(buf, len, "0\t: 0:%d\n", CFS_CPT_DISTANCE);
+   len -= rc;
+   if (len <= 0)
+   return -EFBIG;
+
+   return rc;
+}
+EXPORT_SYMBOL(cfs_cpt_distance_print);
+
 int
 cfs_cpt_number(struct cfs_cpt_table *cptab)
 {
@@ -124,6 +139,12 @@ cpumask_var_t *cfs_cpt_cpumask(struct cfs_cpt_table 
*cptab, int cpt)
 }
 EXPORT_SYMBOL(cfs_cpt_nodemask);
 
+unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab, int cpt1, int cpt2)
+{
+   return CFS_CPT_DISTANCE;
+}
+EXPORT_SYMBOL(cfs_cpt_distance);
+
 int
 cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
 {
diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c 
b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
index fd0c451..1e184b1 100644
--- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
@@ -76,6 +76,7 @@
struct cfs_cpu_partition *part = >ctb_parts[i];
 
kfree(part->cpt_nodemask);
+   kfree(part->cpt_distance);
free_cpumask_var(part->cpt_cpumask);
}
 
@@ -137,6 +138,12 @@ struct cfs_cpt_table *
if (!zalloc_cpumask_var(>cpt_cpumask, GFP_NOFS) ||
!part->cpt_nodemask)
goto failed;
+
+   part->cpt_distance = kvmalloc_array(cptab->ctb_nparts,
+

[PATCH 03/25] staging: lustre: libcfs: implement cfs_cpt_cpumask for UMP case

2018-04-15 Thread James Simmons

From: Amir Shehata 

The function cfs_cpt_cpumask() exist for SMP systems but when
CONFIG_SMP is disabled it only returns NULL. Fill in this missing
function. Also properly initialize ctb_mask for the UMP
case.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h | 16 +---
 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c  |  9 +
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 1f2cd78..070f8fe 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -77,10 +77,6 @@
 
 #ifdef CONFIG_SMP
 /**
- * return cpumask of CPU partition \a cpt
- */
-cpumask_var_t *cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt);
-/**
  * print string information of cpt-table
  */
 int cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len);
@@ -89,19 +85,13 @@ struct cfs_cpt_table {
/* # of CPU partitions */
int ctb_nparts;
/* cpu mask */
-   cpumask_t   ctb_mask;
+   cpumask_var_t   ctb_mask;
/* node mask */
nodemask_t  ctb_nodemask;
/* version */
u64 ctb_version;
 };
 
-static inline cpumask_var_t *
-cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
-{
-   return NULL;
-}
-
 static inline int
 cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len)
 {
@@ -133,6 +123,10 @@ struct cfs_cpt_table {
  */
 int cfs_cpt_online(struct cfs_cpt_table *cptab, int cpt);
 /**
+ * return cpumask of CPU partition \a cpt
+ */
+cpumask_var_t *cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt);
+/**
  * return nodemask of CPU partition \a cpt
  */
 nodemask_t *cfs_cpt_nodemask(struct cfs_cpt_table *cptab, int cpt);
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index 705abf2..5ea294f 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -54,6 +54,9 @@ struct cfs_cpt_table *
cptab = kzalloc(sizeof(*cptab), GFP_NOFS);
if (cptab) {
cptab->ctb_version = CFS_CPU_VERSION_MAGIC;
+   if (!zalloc_cpumask_var(>ctb_mask, GFP_NOFS))
+   return NULL;
+   cpumask_set_cpu(0, cptab->ctb_mask);
node_set(0, cptab->ctb_nodemask);
cptab->ctb_nparts  = ncpt;
}
@@ -108,6 +111,12 @@ struct cfs_cpt_table *
 }
 EXPORT_SYMBOL(cfs_cpt_online);
 
+cpumask_var_t *cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
+{
+   return >ctb_mask;
+}
+EXPORT_SYMBOL(cfs_cpt_cpumask);
+
 nodemask_t *
 cfs_cpt_nodemask(struct cfs_cpt_table *cptab, int cpt)
 {
-- 
1.8.3.1

[PATCH 06/25] staging: lustre: libcfs: replace num_possible_cpus() with nr_cpu_ids

2018-04-15 Thread James Simmons

From: Amir Shehata 

Move from num_possible_cpus() to nr_cpu_ids.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
 drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c 
b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
index b2a88ef..741db69 100644
--- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
@@ -105,14 +105,14 @@ struct cfs_cpt_table *
!cptab->ctb_nodemask)
goto failed;
 
-   cptab->ctb_cpu2cpt = kvmalloc_array(num_possible_cpus(),
+   cptab->ctb_cpu2cpt = kvmalloc_array(nr_cpu_ids,
sizeof(cptab->ctb_cpu2cpt[0]),
GFP_KERNEL);
if (!cptab->ctb_cpu2cpt)
goto failed;
 
memset(cptab->ctb_cpu2cpt, -1,
-  num_possible_cpus() * sizeof(cptab->ctb_cpu2cpt[0]));
+  nr_cpu_ids * sizeof(cptab->ctb_cpu2cpt[0]));
 
cptab->ctb_parts = kvmalloc_array(ncpt, sizeof(cptab->ctb_parts[0]),
  GFP_KERNEL);
-- 
1.8.3.1

[PATCH 08/25] staging: lustre: libcfs: add cpu distance handling

2018-04-15 Thread James Simmons

From: Amir Shehata 

Add functionality to calculate the distance between two CPTs.
Expose those distance in debugfs so people deploying a setup
can debug what is being created for CPTs.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
 .../lustre/include/linux/libcfs/libcfs_cpu.h   |  8 +++
 .../lustre/include/linux/libcfs/linux/linux-cpu.h  |  4 ++
 drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c| 21 
 .../staging/lustre/lnet/libcfs/linux/linux-cpu.c   | 59 ++
 4 files changed, 92 insertions(+)

diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
index 839ec02..c0922fc 100644
--- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h
@@ -110,6 +110,10 @@ struct cfs_cpt_table {
  */
 struct cfs_cpt_table *cfs_cpt_table_alloc(unsigned int ncpt);
 /**
+ * print distance information of cpt-table
+ */
+int cfs_cpt_distance_print(struct cfs_cpt_table *cptab, char *buf, int len);
+/**
  * return total number of CPU partitions in \a cptab
  */
 int
@@ -143,6 +147,10 @@ struct cfs_cpt_table {
  */
 int cfs_cpt_of_node(struct cfs_cpt_table *cptab, int node);
 /**
+ * NUMA distance between \a cpt1 and \a cpt2 in \a cptab
+ */
+unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab, int cpt1, int cpt2);
+/**
  * bind current thread on a CPU-partition \a cpt of \a cptab
  */
 int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt);
diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h 
b/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h
index 1bed0ba..4ac1670 100644
--- a/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h
+++ b/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h
@@ -52,6 +52,8 @@ struct cfs_cpu_partition {
cpumask_var_t   cpt_cpumask;
/* nodes mask for this partition */
nodemask_t  *cpt_nodemask;
+   /* NUMA distance between CPTs */
+   unsigned int*cpt_distance;
/* spread rotor for NUMA allocator */
unsigned intcpt_spread_rotor;
 };
@@ -60,6 +62,8 @@ struct cfs_cpu_partition {
 struct cfs_cpt_table {
/* spread rotor for NUMA allocator */
unsigned intctb_spread_rotor;
+   /* maximum NUMA distance between all nodes in table */
+   unsigned intctb_distance;
/* # of CPU partitions */
unsigned intctb_nparts;
/* partitions tables */
diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c 
b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
index e6d1512..7ac2796 100644
--- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c
@@ -41,6 +41,8 @@
 
 #define CFS_CPU_VERSION_MAGIC 0xbabecafe
 
+#define CFS_CPT_DISTANCE   1   /* Arbitrary positive value */
+
 struct cfs_cpt_table *
 cfs_cpt_table_alloc(unsigned int ncpt)
 {
@@ -90,6 +92,19 @@ struct cfs_cpt_table *
 EXPORT_SYMBOL(cfs_cpt_table_print);
 #endif /* CONFIG_SMP */
 
+int cfs_cpt_distance_print(struct cfs_cpt_table *cptab, char *buf, int len)
+{
+   int rc;
+
+   rc = snprintf(buf, len, "0\t: 0:%d\n", CFS_CPT_DISTANCE);
+   len -= rc;
+   if (len <= 0)
+   return -EFBIG;
+
+   return rc;
+}
+EXPORT_SYMBOL(cfs_cpt_distance_print);
+
 int
 cfs_cpt_number(struct cfs_cpt_table *cptab)
 {
@@ -124,6 +139,12 @@ cpumask_var_t *cfs_cpt_cpumask(struct cfs_cpt_table 
*cptab, int cpt)
 }
 EXPORT_SYMBOL(cfs_cpt_nodemask);
 
+unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab, int cpt1, int cpt2)
+{
+   return CFS_CPT_DISTANCE;
+}
+EXPORT_SYMBOL(cfs_cpt_distance);
+
 int
 cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
 {
diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c 
b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
index fd0c451..1e184b1 100644
--- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
@@ -76,6 +76,7 @@
struct cfs_cpu_partition *part = >ctb_parts[i];
 
kfree(part->cpt_nodemask);
+   kfree(part->cpt_distance);
free_cpumask_var(part->cpt_cpumask);
}
 
@@ -137,6 +138,12 @@ struct cfs_cpt_table *
if (!zalloc_cpumask_var(>cpt_cpumask, GFP_NOFS) ||
!part->cpt_nodemask)
goto failed;
+
+   part->cpt_distance = kvmalloc_array(cptab->ctb_nparts,
+   
sizeof(part->cpt_distance[0]),
+

[PATCH 09/25] staging: lustre: libcfs: use distance in cpu and node handling

2018-04-15 Thread James Simmons

From: Amir Shehata 

Take into consideration the location of NUMA nodes and core
when calling cfs_cpt_[un]set_cpu() and cfs_cpt_[un]set_node().
This enables functioning on platforms with 100s of cores and
NUMA nodes.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
 .../staging/lustre/lnet/libcfs/linux/linux-cpu.c   | 192 +++--
 1 file changed, 143 insertions(+), 49 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c 
b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
index 1e184b1..bbf89b8 100644
--- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
@@ -300,11 +300,134 @@ unsigned int cfs_cpt_distance(struct cfs_cpt_table 
*cptab, int cpt1, int cpt2)
 }
 EXPORT_SYMBOL(cfs_cpt_distance);
 
+/*
+ * Calculate the maximum NUMA distance between all nodes in the
+ * from_mask and all nodes in the to_mask.
+ */
+static unsigned int cfs_cpt_distance_calculate(nodemask_t *from_mask,
+  nodemask_t *to_mask)
+{
+   unsigned int maximum;
+   unsigned int distance;
+   int from;
+   int to;
+
+   maximum = 0;
+   for_each_node_mask(from, *from_mask) {
+   for_each_node_mask(to, *to_mask) {
+   distance = node_distance(from, to);
+   if (maximum < distance)
+   maximum = distance;
+   }
+   }
+   return maximum;
+}
+
+static void cfs_cpt_add_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
+{
+   cptab->ctb_cpu2cpt[cpu] = cpt;
+
+   cpumask_set_cpu(cpu, cptab->ctb_cpumask);
+   cpumask_set_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask);
+}
+
+static void cfs_cpt_del_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
+{
+   cpumask_clear_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask);
+   cpumask_clear_cpu(cpu, cptab->ctb_cpumask);
+
+   cptab->ctb_cpu2cpt[cpu] = -1;
+}
+
+static void cfs_cpt_add_node(struct cfs_cpt_table *cptab, int cpt, int node)
+{
+   struct cfs_cpu_partition *part;
+
+   if (!node_isset(node, *cptab->ctb_nodemask)) {
+   unsigned int dist;
+
+   /* first time node is added to the CPT table */
+   node_set(node, *cptab->ctb_nodemask);
+   cptab->ctb_node2cpt[node] = cpt;
+
+   dist = cfs_cpt_distance_calculate(cptab->ctb_nodemask,
+ cptab->ctb_nodemask);
+   cptab->ctb_distance = dist;
+   }
+
+   part = >ctb_parts[cpt];
+   if (!node_isset(node, *part->cpt_nodemask)) {
+   int cpt2;
+
+   /* first time node is added to this CPT */
+   node_set(node, *part->cpt_nodemask);
+   for (cpt2 = 0; cpt2 < cptab->ctb_nparts; cpt2++) {
+   struct cfs_cpu_partition *part2;
+   unsigned int dist;
+
+   part2 = >ctb_parts[cpt2];
+   dist = cfs_cpt_distance_calculate(part->cpt_nodemask,
+ part2->cpt_nodemask);
+   part->cpt_distance[cpt2] = dist;
+   dist = cfs_cpt_distance_calculate(part2->cpt_nodemask,
+ part->cpt_nodemask);
+   part2->cpt_distance[cpt] = dist;
+   }
+   }
+}
+
+static void cfs_cpt_del_node(struct cfs_cpt_table *cptab, int cpt, int node)
+{
+   struct cfs_cpu_partition *part = >ctb_parts[cpt];
+   int cpu;
+
+   for_each_cpu(cpu, part->cpt_cpumask) {
+   /* this CPT has other CPU belonging to this node? */
+   if (cpu_to_node(cpu) == node)
+   break;
+   }
+
+   if (cpu >= nr_cpu_ids && node_isset(node,  *part->cpt_nodemask)) {
+   int cpt2;
+
+   /* No more CPUs in the node for this CPT. */
+   node_clear(node, *part->cpt_nodemask);
+   for (cpt2 = 0; cpt2 < cptab->ctb_nparts; cpt2++) {
+   struct cfs_cpu_partition *part2;
+   unsigned int dist;
+
+   part2 = >ctb_parts[cpt2];
+   if (node_isset(node, *part2->cpt_nodemask))
+   cptab->ctb_node2cpt[node] = cpt2;
+
+   dist = cfs_cpt_distance_calculate(part->cpt_nodemask,
+ part2->cpt_nodemask);
+   part->cpt_distance[cpt2] = dist;
+   dist = cfs_cpt_distance_calculate(part2->cpt_nodemask,
+

[PATCH 09/25] staging: lustre: libcfs: use distance in cpu and node handling

2018-04-15 Thread James Simmons

From: Amir Shehata 

Take into consideration the location of NUMA nodes and core
when calling cfs_cpt_[un]set_cpu() and cfs_cpt_[un]set_node().
This enables functioning on platforms with 100s of cores and
NUMA nodes.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
 .../staging/lustre/lnet/libcfs/linux/linux-cpu.c   | 192 +++--
 1 file changed, 143 insertions(+), 49 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c 
b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
index 1e184b1..bbf89b8 100644
--- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
@@ -300,11 +300,134 @@ unsigned int cfs_cpt_distance(struct cfs_cpt_table 
*cptab, int cpt1, int cpt2)
 }
 EXPORT_SYMBOL(cfs_cpt_distance);
 
+/*
+ * Calculate the maximum NUMA distance between all nodes in the
+ * from_mask and all nodes in the to_mask.
+ */
+static unsigned int cfs_cpt_distance_calculate(nodemask_t *from_mask,
+  nodemask_t *to_mask)
+{
+   unsigned int maximum;
+   unsigned int distance;
+   int from;
+   int to;
+
+   maximum = 0;
+   for_each_node_mask(from, *from_mask) {
+   for_each_node_mask(to, *to_mask) {
+   distance = node_distance(from, to);
+   if (maximum < distance)
+   maximum = distance;
+   }
+   }
+   return maximum;
+}
+
+static void cfs_cpt_add_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
+{
+   cptab->ctb_cpu2cpt[cpu] = cpt;
+
+   cpumask_set_cpu(cpu, cptab->ctb_cpumask);
+   cpumask_set_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask);
+}
+
+static void cfs_cpt_del_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
+{
+   cpumask_clear_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask);
+   cpumask_clear_cpu(cpu, cptab->ctb_cpumask);
+
+   cptab->ctb_cpu2cpt[cpu] = -1;
+}
+
+static void cfs_cpt_add_node(struct cfs_cpt_table *cptab, int cpt, int node)
+{
+   struct cfs_cpu_partition *part;
+
+   if (!node_isset(node, *cptab->ctb_nodemask)) {
+   unsigned int dist;
+
+   /* first time node is added to the CPT table */
+   node_set(node, *cptab->ctb_nodemask);
+   cptab->ctb_node2cpt[node] = cpt;
+
+   dist = cfs_cpt_distance_calculate(cptab->ctb_nodemask,
+ cptab->ctb_nodemask);
+   cptab->ctb_distance = dist;
+   }
+
+   part = >ctb_parts[cpt];
+   if (!node_isset(node, *part->cpt_nodemask)) {
+   int cpt2;
+
+   /* first time node is added to this CPT */
+   node_set(node, *part->cpt_nodemask);
+   for (cpt2 = 0; cpt2 < cptab->ctb_nparts; cpt2++) {
+   struct cfs_cpu_partition *part2;
+   unsigned int dist;
+
+   part2 = >ctb_parts[cpt2];
+   dist = cfs_cpt_distance_calculate(part->cpt_nodemask,
+ part2->cpt_nodemask);
+   part->cpt_distance[cpt2] = dist;
+   dist = cfs_cpt_distance_calculate(part2->cpt_nodemask,
+ part->cpt_nodemask);
+   part2->cpt_distance[cpt] = dist;
+   }
+   }
+}
+
+static void cfs_cpt_del_node(struct cfs_cpt_table *cptab, int cpt, int node)
+{
+   struct cfs_cpu_partition *part = >ctb_parts[cpt];
+   int cpu;
+
+   for_each_cpu(cpu, part->cpt_cpumask) {
+   /* this CPT has other CPU belonging to this node? */
+   if (cpu_to_node(cpu) == node)
+   break;
+   }
+
+   if (cpu >= nr_cpu_ids && node_isset(node,  *part->cpt_nodemask)) {
+   int cpt2;
+
+   /* No more CPUs in the node for this CPT. */
+   node_clear(node, *part->cpt_nodemask);
+   for (cpt2 = 0; cpt2 < cptab->ctb_nparts; cpt2++) {
+   struct cfs_cpu_partition *part2;
+   unsigned int dist;
+
+   part2 = >ctb_parts[cpt2];
+   if (node_isset(node, *part2->cpt_nodemask))
+   cptab->ctb_node2cpt[node] = cpt2;
+
+   dist = cfs_cpt_distance_calculate(part->cpt_nodemask,
+ part2->cpt_nodemask);
+   part->cpt_distance[cpt2] = dist;
+   dist = cfs_cpt_distance_calculate(part2->cpt_nodemask,
+ part->cpt_nodemask);
+

[PATCH 11/25] staging: lustre: libcfs: invert error handling for cfs_cpt_table_print

2018-04-15 Thread James Simmons

From: Amir Shehata 

Instead of setting rc to -EFBIG for several cases in the loop lets
initialize rc to -EFBIG and just break out of the loop in case of
failure. Just set rc to zero once we successfully finish the loop.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
 .../staging/lustre/lnet/libcfs/linux/linux-cpu.c   | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c 
b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
index bbf89b8..6d8dcd3 100644
--- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
@@ -158,29 +158,26 @@ struct cfs_cpt_table *
 cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len)
 {
char *tmp = buf;
-   int rc = 0;
+   int rc = -EFBIG;
int i;
int j;
 
for (i = 0; i < cptab->ctb_nparts; i++) {
-   if (len > 0) {
-   rc = snprintf(tmp, len, "%d\t:", i);
-   len -= rc;
-   }
+   if (len <= 0)
+   goto out;
+
+   rc = snprintf(tmp, len, "%d\t:", i);
+   len -= rc;
 
-   if (len <= 0) {
-   rc = -EFBIG;
+   if (len <= 0)
goto out;
-   }
 
tmp += rc;
for_each_cpu(j, cptab->ctb_parts[i].cpt_cpumask) {
-   rc = snprintf(tmp, len, "%d ", j);
+   rc = snprintf(tmp, len, " %d", j);
len -= rc;
-   if (len <= 0) {
-   rc = -EFBIG;
+   if (len <= 0)
goto out;
-   }
tmp += rc;
}
 
@@ -189,6 +186,7 @@ struct cfs_cpt_table *
len--;
}
 
+   rc = 0;
  out:
if (rc < 0)
return rc;
-- 
1.8.3.1

[PATCH 11/25] staging: lustre: libcfs: invert error handling for cfs_cpt_table_print

2018-04-15 Thread James Simmons

From: Amir Shehata 

Instead of setting rc to -EFBIG for several cases in the loop lets
initialize rc to -EFBIG and just break out of the loop in case of
failure. Just set rc to zero once we successfully finish the loop.

Signed-off-by: Amir Shehata 
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734
Reviewed-on: http://review.whamcloud.com/18916
Reviewed-by: Olaf Weber 
Reviewed-by: Doug Oucharek 
Signed-off-by: James Simmons 
---
 .../staging/lustre/lnet/libcfs/linux/linux-cpu.c   | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c 
b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
index bbf89b8..6d8dcd3 100644
--- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
+++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c
@@ -158,29 +158,26 @@ struct cfs_cpt_table *
 cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len)
 {
char *tmp = buf;
-   int rc = 0;
+   int rc = -EFBIG;
int i;
int j;
 
for (i = 0; i < cptab->ctb_nparts; i++) {
-   if (len > 0) {
-   rc = snprintf(tmp, len, "%d\t:", i);
-   len -= rc;
-   }
+   if (len <= 0)
+   goto out;
+
+   rc = snprintf(tmp, len, "%d\t:", i);
+   len -= rc;
 
-   if (len <= 0) {
-   rc = -EFBIG;
+   if (len <= 0)
goto out;
-   }
 
tmp += rc;
for_each_cpu(j, cptab->ctb_parts[i].cpt_cpumask) {
-   rc = snprintf(tmp, len, "%d ", j);
+   rc = snprintf(tmp, len, " %d", j);
len -= rc;
-   if (len <= 0) {
-   rc = -EFBIG;
+   if (len <= 0)
goto out;
-   }
tmp += rc;
}
 
@@ -189,6 +186,7 @@ struct cfs_cpt_table *
len--;
}
 
+   rc = 0;
  out:
if (rc < 0)
return rc;
-- 
1.8.3.1

1 2 3 4 5 6 >

1 - 100 of 584 matches

Mail list logo