KASAN: use-after-free Read in tipc_nametbl_stop
Hello, syzbot hit the following crash on net-next commit 5d1365940a68dd57b031b6e3c07d7d451cd69daf (Thu Apr 12 18:09:05 2018 +) Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=d64b64afc55660106556 So far this crash happened 5 times on net-next, upstream. C reproducer: https://syzkaller.appspot.com/x/repro.c?id=6319968803094528 syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=6099825221173248 Raw console output: https://syzkaller.appspot.com/x/log.txt?id=4953018151731200 Kernel config: https://syzkaller.appspot.com/x/.config?id=-5947642240294114534 compiler: gcc (GCC) 8.0.1 20180413 (experimental) IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+d64b64afc55660106...@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. Failed to remove local publication {0,0,0}/20641 IPVS: ftp: loaded support on port[0] = 21 IPVS: ftp: loaded support on port[0] = 21 IPVS: ftp: loaded support on port[0] = 21 == BUG: KASAN: use-after-free in tipc_service_delete net/tipc/name_table.c:751 [inline] BUG: KASAN: use-after-free in tipc_nametbl_stop+0x94e/0xd70 net/tipc/name_table.c:780 Read of size 8 at addr 8801c4c25130 by task kworker/u4:2/30 CPU: 0 PID: 30 Comm: kworker/u4:2 Not tainted 4.16.0+ #1 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 tipc_service_delete net/tipc/name_table.c:751 [inline] tipc_nametbl_stop+0x94e/0xd70 net/tipc/name_table.c:780 tipc_exit_net+0x2d/0x40 net/tipc/core.c:103 ops_exit_list.isra.7+0xb0/0x160 net/core/net_namespace.c:152 cleanup_net+0x51d/0xb20 net/core/net_namespace.c:523 process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145 worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279 kthread+0x345/0x410 kernel/kthread.c:238 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411 Allocated by task 4535: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553 kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620 kmalloc include/linux/slab.h:512 [inline] kzalloc include/linux/slab.h:701 [inline] tipc_service_create_range net/tipc/name_table.c:183 [inline] tipc_service_insert_publ net/tipc/name_table.c:207 [inline] tipc_nametbl_insert_publ+0x569/0x1910 net/tipc/name_table.c:371 tipc_nametbl_publish+0x6c3/0xba0 net/tipc/name_table.c:618 tipc_sk_publish+0x22a/0x510 net/tipc/socket.c:2604 tipc_bind+0x206/0x330 net/tipc/socket.c:647 __sys_bind+0x331/0x440 net/socket.c:1484 SYSC_bind net/socket.c:1495 [inline] SyS_bind+0x24/0x30 net/socket.c:1493 do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Freed by task 30: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kfree+0xd9/0x260 mm/slab.c:3813 tipc_service_remove_publ.isra.8+0x909/0xc30 net/tipc/name_table.c:283 tipc_service_delete net/tipc/name_table.c:753 [inline] tipc_nametbl_stop+0x746/0xd70 net/tipc/name_table.c:780 tipc_exit_net+0x2d/0x40 net/tipc/core.c:103 ops_exit_list.isra.7+0xb0/0x160 net/core/net_namespace.c:152 cleanup_net+0x51d/0xb20 net/core/net_namespace.c:523 process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145 worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279 kthread+0x345/0x410 kernel/kthread.c:238 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411 The buggy address belongs to the object at 8801c4c25100 which belongs to the cache kmalloc-64 of size 64 The buggy address is located 48 bytes inside of 64-byte region [8801c4c25100, 8801c4c25140) The buggy address belongs to the page: page:ea0007130940 count:1 mapcount:0 mapping:8801c4c25000 index:0x0 flags: 0x2fffc000100(slab) raw: 02fffc000100 8801c4c25000 00010020 raw: ea0006ccf860 ea00070840a0 8801dac00340 page dumped because: kasan: bad access detected Memory state around the buggy address: 8801c4c25000: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc 8801c4c25080: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc 8801c4c25100: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ^
KASAN: use-after-free Read in tipc_nametbl_stop
Hello, syzbot hit the following crash on net-next commit 5d1365940a68dd57b031b6e3c07d7d451cd69daf (Thu Apr 12 18:09:05 2018 +) Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=d64b64afc55660106556 So far this crash happened 5 times on net-next, upstream. C reproducer: https://syzkaller.appspot.com/x/repro.c?id=6319968803094528 syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=6099825221173248 Raw console output: https://syzkaller.appspot.com/x/log.txt?id=4953018151731200 Kernel config: https://syzkaller.appspot.com/x/.config?id=-5947642240294114534 compiler: gcc (GCC) 8.0.1 20180413 (experimental) IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+d64b64afc55660106...@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. Failed to remove local publication {0,0,0}/20641 IPVS: ftp: loaded support on port[0] = 21 IPVS: ftp: loaded support on port[0] = 21 IPVS: ftp: loaded support on port[0] = 21 == BUG: KASAN: use-after-free in tipc_service_delete net/tipc/name_table.c:751 [inline] BUG: KASAN: use-after-free in tipc_nametbl_stop+0x94e/0xd70 net/tipc/name_table.c:780 Read of size 8 at addr 8801c4c25130 by task kworker/u4:2/30 CPU: 0 PID: 30 Comm: kworker/u4:2 Not tainted 4.16.0+ #1 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 tipc_service_delete net/tipc/name_table.c:751 [inline] tipc_nametbl_stop+0x94e/0xd70 net/tipc/name_table.c:780 tipc_exit_net+0x2d/0x40 net/tipc/core.c:103 ops_exit_list.isra.7+0xb0/0x160 net/core/net_namespace.c:152 cleanup_net+0x51d/0xb20 net/core/net_namespace.c:523 process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145 worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279 kthread+0x345/0x410 kernel/kthread.c:238 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411 Allocated by task 4535: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553 kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620 kmalloc include/linux/slab.h:512 [inline] kzalloc include/linux/slab.h:701 [inline] tipc_service_create_range net/tipc/name_table.c:183 [inline] tipc_service_insert_publ net/tipc/name_table.c:207 [inline] tipc_nametbl_insert_publ+0x569/0x1910 net/tipc/name_table.c:371 tipc_nametbl_publish+0x6c3/0xba0 net/tipc/name_table.c:618 tipc_sk_publish+0x22a/0x510 net/tipc/socket.c:2604 tipc_bind+0x206/0x330 net/tipc/socket.c:647 __sys_bind+0x331/0x440 net/socket.c:1484 SYSC_bind net/socket.c:1495 [inline] SyS_bind+0x24/0x30 net/socket.c:1493 do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Freed by task 30: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kfree+0xd9/0x260 mm/slab.c:3813 tipc_service_remove_publ.isra.8+0x909/0xc30 net/tipc/name_table.c:283 tipc_service_delete net/tipc/name_table.c:753 [inline] tipc_nametbl_stop+0x746/0xd70 net/tipc/name_table.c:780 tipc_exit_net+0x2d/0x40 net/tipc/core.c:103 ops_exit_list.isra.7+0xb0/0x160 net/core/net_namespace.c:152 cleanup_net+0x51d/0xb20 net/core/net_namespace.c:523 process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145 worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279 kthread+0x345/0x410 kernel/kthread.c:238 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411 The buggy address belongs to the object at 8801c4c25100 which belongs to the cache kmalloc-64 of size 64 The buggy address is located 48 bytes inside of 64-byte region [8801c4c25100, 8801c4c25140) The buggy address belongs to the page: page:ea0007130940 count:1 mapcount:0 mapping:8801c4c25000 index:0x0 flags: 0x2fffc000100(slab) raw: 02fffc000100 8801c4c25000 00010020 raw: ea0006ccf860 ea00070840a0 8801dac00340 page dumped because: kasan: bad access detected Memory state around the buggy address: 8801c4c25000: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc 8801c4c25080: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc 8801c4c25100: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ^
Wrong module .text address in 4.16.0
I just installed 4.16.0 and discovered the module .text address is wrong. It happens on s390 and x86 platforms. I have not tested others. Here is the issue, I have used module qeth_l2 on s390 which is the ethernet device driver: root@s35lp76 ~]# lsmod Module Size Used by qeth_l294208 1 ... [root@s35lp76 ~]# cat /proc/modules | egrep '^qeth_l2' qeth_l2 94208 1 - Live 0x03ff80401000 < This is the correct address in memory [root@s35lp76 ~]# cat /sys/module/qeth_l2/sections/.text 0x18ea8363 < This is the wrong address [root@s35lp76 ~]# File /sys/module/qeth_l2/sections/.text displays a very strange address which is definitely wrong. It should be something like 0x03ff80401xxx. Same on x86. I have checked file kernel/module.c function add_sect_attrs() and it calls module_sect_show() when the sysfs file is read. And module_sect_show() uses sprintf(buf, "0x%pK\n", (void *)sattr->address); and my sysctl setting should be correct: [root@s35lp76 linux]# sysctl -a | fgrep kernel.kptr_restrict kernel.kptr_restrict = 0 [root@s35lp76 linux]# I wonder if somebody else has seen this issue? Ideas how to fix this? Thanks -- Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany -- Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
Wrong module .text address in 4.16.0
I just installed 4.16.0 and discovered the module .text address is wrong. It happens on s390 and x86 platforms. I have not tested others. Here is the issue, I have used module qeth_l2 on s390 which is the ethernet device driver: root@s35lp76 ~]# lsmod Module Size Used by qeth_l294208 1 ... [root@s35lp76 ~]# cat /proc/modules | egrep '^qeth_l2' qeth_l2 94208 1 - Live 0x03ff80401000 < This is the correct address in memory [root@s35lp76 ~]# cat /sys/module/qeth_l2/sections/.text 0x18ea8363 < This is the wrong address [root@s35lp76 ~]# File /sys/module/qeth_l2/sections/.text displays a very strange address which is definitely wrong. It should be something like 0x03ff80401xxx. Same on x86. I have checked file kernel/module.c function add_sect_attrs() and it calls module_sect_show() when the sysfs file is read. And module_sect_show() uses sprintf(buf, "0x%pK\n", (void *)sattr->address); and my sysctl setting should be correct: [root@s35lp76 linux]# sysctl -a | fgrep kernel.kptr_restrict kernel.kptr_restrict = 0 [root@s35lp76 linux]# I wonder if somebody else has seen this issue? Ideas how to fix this? Thanks -- Thomas Richter, Dept 3303, IBM LTC Boeblingen Germany -- Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
Re: [PATCH v13 6/6] PCI/DPC: Do not do recovery for hotplug enabled system
On 2018-04-16 11:03, p...@codeaurora.org wrote: On 2018-04-16 08:47, Bjorn Helgaas wrote: On Sat, Apr 14, 2018 at 11:53:17AM -0400, Sinan Kaya wrote: You indicated that you want to unify the AER and DPC behavior. Let's settle on what we want to do one more time. We have been going forth and back on the direction. My thinking is that as much as possible, similar events should be handled similarly, whether the mechanism is AER, DPC, EEH, etc. Ideally, drivers shouldn't have to be aware of which mechanism is in use. Error recovery includes conventional PCI as well, but right now I think we're only concerned with PCIe. The following error types are from PCIe r4.0, sec 6.2.2: ERR_COR Corrected by hardware with no software intervention. Software involved for logging only. Handled by AER via pci_error_handlers; DPC is never involved. Link is unaffected. ERR_NONFATAL A transaction is unreliable but the link is fully functional. If DPC is not supported, handled by AER via pci_error_handlers and the link is unaffected. If DPC supported, handled by DPC (because we set PCI_EXP_DPC_CTL_EN_NONFATAL) via remove/re-enumerate. ERR_FATAL The link is unreliable. If DPC is not supported, handled by AER via pci_error_handlers and the link is reset. If DPC supported, handled by DPC via remove/re-enumerate. It doesn't seem right to me that we handle both ERR_NONFATAL and ERR_FATAL events differently if we happen to have DPC support in a switch. Maybe we should consider triggering DPC only on ERR_FATAL? That would keep DPC out of the ERR_NONFATAL cases. For ERR_FATAL, maybe we should bite the bullet and use remove/re-enumerate for AER as well as for DPC. That would be painful for higher-level software, but if we're willing to accept that pain for new systems that support DPC, maybe life would be better overall if it worked the same way on systems without DPC? Bjorn This had crossed my mind when I first looked at the code. DPC is getting triggered for both ERR_NONFATAL and ERR_FATAL case. I thought the primary purpose of DPC to recover fatal errors, by triggering HW recovery. but what if some platform wants to handle both FATAL and NON_FATAL with DPC ? As you said AER FATAL cases and DPC FATAL cases should be handled similarly. e.g. remove/re-enumerate the devices. while NON_FATAL case; only AER would come into picture. if some platform would like to handle DPC NON_FATAL then it should follow AER NON_FATAL path (where it does not do remove/re-enumerate) And the case where hotplug is enabled, remove/re-enumerate more sense in case of ERR_FATAL. And the case where hotplug is disabled, only re-enumeration is required. (no need to remove the devices) but then do we need to handle this case specifically, what is the harm in removing the devices in all the cases followed by re-enumerate ? To Clarify the last line, what I meant here was, in case of ERR_FATAL we can always remove/re-enumerate the devices irrespective of hotplug is enabled or not. and in case of ERR_NONFATAL, DPC will follow AER path (where it just tries to recover) although I am not very sure that how to handle ERR_NONFATAL case if hotplug is enabled. Because as Keith suggested device might have been changed run-time. Regards, Oza.
Re: [PATCH v13 6/6] PCI/DPC: Do not do recovery for hotplug enabled system
On 2018-04-16 11:03, p...@codeaurora.org wrote: On 2018-04-16 08:47, Bjorn Helgaas wrote: On Sat, Apr 14, 2018 at 11:53:17AM -0400, Sinan Kaya wrote: You indicated that you want to unify the AER and DPC behavior. Let's settle on what we want to do one more time. We have been going forth and back on the direction. My thinking is that as much as possible, similar events should be handled similarly, whether the mechanism is AER, DPC, EEH, etc. Ideally, drivers shouldn't have to be aware of which mechanism is in use. Error recovery includes conventional PCI as well, but right now I think we're only concerned with PCIe. The following error types are from PCIe r4.0, sec 6.2.2: ERR_COR Corrected by hardware with no software intervention. Software involved for logging only. Handled by AER via pci_error_handlers; DPC is never involved. Link is unaffected. ERR_NONFATAL A transaction is unreliable but the link is fully functional. If DPC is not supported, handled by AER via pci_error_handlers and the link is unaffected. If DPC supported, handled by DPC (because we set PCI_EXP_DPC_CTL_EN_NONFATAL) via remove/re-enumerate. ERR_FATAL The link is unreliable. If DPC is not supported, handled by AER via pci_error_handlers and the link is reset. If DPC supported, handled by DPC via remove/re-enumerate. It doesn't seem right to me that we handle both ERR_NONFATAL and ERR_FATAL events differently if we happen to have DPC support in a switch. Maybe we should consider triggering DPC only on ERR_FATAL? That would keep DPC out of the ERR_NONFATAL cases. For ERR_FATAL, maybe we should bite the bullet and use remove/re-enumerate for AER as well as for DPC. That would be painful for higher-level software, but if we're willing to accept that pain for new systems that support DPC, maybe life would be better overall if it worked the same way on systems without DPC? Bjorn This had crossed my mind when I first looked at the code. DPC is getting triggered for both ERR_NONFATAL and ERR_FATAL case. I thought the primary purpose of DPC to recover fatal errors, by triggering HW recovery. but what if some platform wants to handle both FATAL and NON_FATAL with DPC ? As you said AER FATAL cases and DPC FATAL cases should be handled similarly. e.g. remove/re-enumerate the devices. while NON_FATAL case; only AER would come into picture. if some platform would like to handle DPC NON_FATAL then it should follow AER NON_FATAL path (where it does not do remove/re-enumerate) And the case where hotplug is enabled, remove/re-enumerate more sense in case of ERR_FATAL. And the case where hotplug is disabled, only re-enumeration is required. (no need to remove the devices) but then do we need to handle this case specifically, what is the harm in removing the devices in all the cases followed by re-enumerate ? To Clarify the last line, what I meant here was, in case of ERR_FATAL we can always remove/re-enumerate the devices irrespective of hotplug is enabled or not. and in case of ERR_NONFATAL, DPC will follow AER path (where it just tries to recover) although I am not very sure that how to handle ERR_NONFATAL case if hotplug is enabled. Because as Keith suggested device might have been changed run-time. Regards, Oza.
Re: [PATCH v13 6/6] PCI/DPC: Do not do recovery for hotplug enabled system
On 2018-04-16 08:47, Bjorn Helgaas wrote: On Sat, Apr 14, 2018 at 11:53:17AM -0400, Sinan Kaya wrote: You indicated that you want to unify the AER and DPC behavior. Let's settle on what we want to do one more time. We have been going forth and back on the direction. My thinking is that as much as possible, similar events should be handled similarly, whether the mechanism is AER, DPC, EEH, etc. Ideally, drivers shouldn't have to be aware of which mechanism is in use. Error recovery includes conventional PCI as well, but right now I think we're only concerned with PCIe. The following error types are from PCIe r4.0, sec 6.2.2: ERR_COR Corrected by hardware with no software intervention. Software involved for logging only. Handled by AER via pci_error_handlers; DPC is never involved. Link is unaffected. ERR_NONFATAL A transaction is unreliable but the link is fully functional. If DPC is not supported, handled by AER via pci_error_handlers and the link is unaffected. If DPC supported, handled by DPC (because we set PCI_EXP_DPC_CTL_EN_NONFATAL) via remove/re-enumerate. ERR_FATAL The link is unreliable. If DPC is not supported, handled by AER via pci_error_handlers and the link is reset. If DPC supported, handled by DPC via remove/re-enumerate. It doesn't seem right to me that we handle both ERR_NONFATAL and ERR_FATAL events differently if we happen to have DPC support in a switch. Maybe we should consider triggering DPC only on ERR_FATAL? That would keep DPC out of the ERR_NONFATAL cases. For ERR_FATAL, maybe we should bite the bullet and use remove/re-enumerate for AER as well as for DPC. That would be painful for higher-level software, but if we're willing to accept that pain for new systems that support DPC, maybe life would be better overall if it worked the same way on systems without DPC? Bjorn This had crossed my mind when I first looked at the code. DPC is getting triggered for both ERR_NONFATAL and ERR_FATAL case. I thought the primary purpose of DPC to recover fatal errors, by triggering HW recovery. but what if some platform wants to handle both FATAL and NON_FATAL with DPC ? As you said AER FATAL cases and DPC FATAL cases should be handled similarly. e.g. remove/re-enumerate the devices. while NON_FATAL case; only AER would come into picture. if some platform would like to handle DPC NON_FATAL then it should follow AER NON_FATAL path (where it does not do remove/re-enumerate) And the case where hotplug is enabled, remove/re-enumerate more sense in case of ERR_FATAL. And the case where hotplug is disabled, only re-enumeration is required. (no need to remove the devices) but then do we need to handle this case specifically, what is the harm in removing the devices in all the cases followed by re-enumerate ? Regards, Oza.
linux-next: Tree for Apr 16
Hi all, Changes since 20180413: The bpf tree gained a build failure for which I applied a patch. Non-merge commits (relative to Linus' tree): 379 366 files changed, 7652 insertions(+), 4560 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And finally, a simple boot test of the powerpc pseries_le_defconfig kernel in qemu (with and without kvm enabled). Below is a summary of the state of the merge. I am currently merging 258 trees (counting Linus' and 44 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (60cc43fc8884 Linux 4.17-rc1) Merging fixes/master (147a89bc71e7 Merge tag 'kconfig-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild) Merging kbuild-current/fixes (28913ee8191a netfilter: nf_nat_snmp_basic: add correct dependency to Makefile) Merging arc-current/for-curr (661e50bc8532 Linux 4.16-rc4) Merging arm-current/fixes (04552a693d60 ARM: kexec: record parent context registers for non-crash CPUs) Merging arm64-fixes/for-next/fixes (e21da1c99200 arm64: Relax ARM_SMCCC_ARCH_WORKAROUND_1 discovery) Merging m68k-current/for-linus (ecd685580c8f m68k/mac: Remove bogus "FIXME" comment) Merging powerpc-fixes/fixes (81b654c27391 powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features) Merging sparc/master (17dec0a94915 Merge branch 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace) Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2) Merging net/master (c246fd333f84 filter.txt: update 'tools/net/' to 'tools/bpf/') Merging bpf/master (700475af1bb5 Merge branch 'bpf-sockmap-fixes') Applying: fix for "bpf: sockmap, map_release does not hold refcnt for pinned maps" Merging ipsec/master (4b66af2d6356 af_key: Always verify length of provided sadb_key) Merging netfilter/master (cf43ae63c024 netfilter: xt_connmark: Add bit mapping for bit-shift operation.) Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook mask only if set) Merging wireless-drivers/master (77e30e10ee28 iwlwifi: mvm: query regdb for wmm rule if needed) Merging mac80211/master (b5dbc28762fd Merge tag 'kbuild-fixes-v4.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild) Merging rdma-fixes/for-rc (84652aefb347 RDMA/ucma: Introduce safer rdma_addr_size() variants) Merging sound-current/for-linus (7ecb46e9ee9a ALSA: line6: Use correct endpoint type for midi output) Merging pci-current/for-linus (adf58458bcb2 PCI: Remove messages about reassigning resources) Merging driver-core.current/driver-core-linus (38c23685b273 Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc) Merging tty.current/tty-linus (38c23685b273 Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc) Merging usb.current/usb-linus (38c23685b273 Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc) Merging usb-gadget-fixes/fixes (c6ba5084ce0d usb: gadget: udc: renesas_usb3: add binging for r8a77965) Merging usb-serial-fixes/usb-linus (86d71233b615 USB: serial: ftdi_sio: add support for Harman FirmwareHubEmulator) Merging usb-chipidea-fixes/ci-for-usb-stable (964728f9f407 USB: chipidea: msm: fix ulpi-node lookup) Merging phy/fixes (59fba0869aca phy: qcom-ufs: add MODULE_LICENSE tag) Merging staging.current/staging-linus (df34df483a97 Merge tag 'staging-4.17-rc1' of
Re: [PATCH v13 6/6] PCI/DPC: Do not do recovery for hotplug enabled system
On 2018-04-16 08:47, Bjorn Helgaas wrote: On Sat, Apr 14, 2018 at 11:53:17AM -0400, Sinan Kaya wrote: You indicated that you want to unify the AER and DPC behavior. Let's settle on what we want to do one more time. We have been going forth and back on the direction. My thinking is that as much as possible, similar events should be handled similarly, whether the mechanism is AER, DPC, EEH, etc. Ideally, drivers shouldn't have to be aware of which mechanism is in use. Error recovery includes conventional PCI as well, but right now I think we're only concerned with PCIe. The following error types are from PCIe r4.0, sec 6.2.2: ERR_COR Corrected by hardware with no software intervention. Software involved for logging only. Handled by AER via pci_error_handlers; DPC is never involved. Link is unaffected. ERR_NONFATAL A transaction is unreliable but the link is fully functional. If DPC is not supported, handled by AER via pci_error_handlers and the link is unaffected. If DPC supported, handled by DPC (because we set PCI_EXP_DPC_CTL_EN_NONFATAL) via remove/re-enumerate. ERR_FATAL The link is unreliable. If DPC is not supported, handled by AER via pci_error_handlers and the link is reset. If DPC supported, handled by DPC via remove/re-enumerate. It doesn't seem right to me that we handle both ERR_NONFATAL and ERR_FATAL events differently if we happen to have DPC support in a switch. Maybe we should consider triggering DPC only on ERR_FATAL? That would keep DPC out of the ERR_NONFATAL cases. For ERR_FATAL, maybe we should bite the bullet and use remove/re-enumerate for AER as well as for DPC. That would be painful for higher-level software, but if we're willing to accept that pain for new systems that support DPC, maybe life would be better overall if it worked the same way on systems without DPC? Bjorn This had crossed my mind when I first looked at the code. DPC is getting triggered for both ERR_NONFATAL and ERR_FATAL case. I thought the primary purpose of DPC to recover fatal errors, by triggering HW recovery. but what if some platform wants to handle both FATAL and NON_FATAL with DPC ? As you said AER FATAL cases and DPC FATAL cases should be handled similarly. e.g. remove/re-enumerate the devices. while NON_FATAL case; only AER would come into picture. if some platform would like to handle DPC NON_FATAL then it should follow AER NON_FATAL path (where it does not do remove/re-enumerate) And the case where hotplug is enabled, remove/re-enumerate more sense in case of ERR_FATAL. And the case where hotplug is disabled, only re-enumeration is required. (no need to remove the devices) but then do we need to handle this case specifically, what is the harm in removing the devices in all the cases followed by re-enumerate ? Regards, Oza.
linux-next: Tree for Apr 16
Hi all, Changes since 20180413: The bpf tree gained a build failure for which I applied a patch. Non-merge commits (relative to Linus' tree): 379 366 files changed, 7652 insertions(+), 4560 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And finally, a simple boot test of the powerpc pseries_le_defconfig kernel in qemu (with and without kvm enabled). Below is a summary of the state of the merge. I am currently merging 258 trees (counting Linus' and 44 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (60cc43fc8884 Linux 4.17-rc1) Merging fixes/master (147a89bc71e7 Merge tag 'kconfig-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild) Merging kbuild-current/fixes (28913ee8191a netfilter: nf_nat_snmp_basic: add correct dependency to Makefile) Merging arc-current/for-curr (661e50bc8532 Linux 4.16-rc4) Merging arm-current/fixes (04552a693d60 ARM: kexec: record parent context registers for non-crash CPUs) Merging arm64-fixes/for-next/fixes (e21da1c99200 arm64: Relax ARM_SMCCC_ARCH_WORKAROUND_1 discovery) Merging m68k-current/for-linus (ecd685580c8f m68k/mac: Remove bogus "FIXME" comment) Merging powerpc-fixes/fixes (81b654c27391 powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features) Merging sparc/master (17dec0a94915 Merge branch 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace) Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2) Merging net/master (c246fd333f84 filter.txt: update 'tools/net/' to 'tools/bpf/') Merging bpf/master (700475af1bb5 Merge branch 'bpf-sockmap-fixes') Applying: fix for "bpf: sockmap, map_release does not hold refcnt for pinned maps" Merging ipsec/master (4b66af2d6356 af_key: Always verify length of provided sadb_key) Merging netfilter/master (cf43ae63c024 netfilter: xt_connmark: Add bit mapping for bit-shift operation.) Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook mask only if set) Merging wireless-drivers/master (77e30e10ee28 iwlwifi: mvm: query regdb for wmm rule if needed) Merging mac80211/master (b5dbc28762fd Merge tag 'kbuild-fixes-v4.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild) Merging rdma-fixes/for-rc (84652aefb347 RDMA/ucma: Introduce safer rdma_addr_size() variants) Merging sound-current/for-linus (7ecb46e9ee9a ALSA: line6: Use correct endpoint type for midi output) Merging pci-current/for-linus (adf58458bcb2 PCI: Remove messages about reassigning resources) Merging driver-core.current/driver-core-linus (38c23685b273 Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc) Merging tty.current/tty-linus (38c23685b273 Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc) Merging usb.current/usb-linus (38c23685b273 Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc) Merging usb-gadget-fixes/fixes (c6ba5084ce0d usb: gadget: udc: renesas_usb3: add binging for r8a77965) Merging usb-serial-fixes/usb-linus (86d71233b615 USB: serial: ftdi_sio: add support for Harman FirmwareHubEmulator) Merging usb-chipidea-fixes/ci-for-usb-stable (964728f9f407 USB: chipidea: msm: fix ulpi-node lookup) Merging phy/fixes (59fba0869aca phy: qcom-ufs: add MODULE_LICENSE tag) Merging staging.current/staging-linus (df34df483a97 Merge tag 'staging-4.17-rc1' of
Re: [PATCHv5] gpio: Remove VLA from gpiolib
On 16/04/2018 13:19, Phil Reid wrote: G'day Laura, One more comment. On 16/04/2018 12:41, Phil Reid wrote: G'day Laura, On 14/04/2018 05:24, Laura Abbott wrote: The new challenge is to remove VLAs from the kernel (see https://lkml.org/lkml/2018/3/7/621) to eventually turn on -Wvla. Using a kmalloc array is the easy way to fix this but kmalloc is still more expensive than stack allocation. Introduce a fast path with a fixed size stack array to cover most chip with gpios below some fixed amount. The slow path dynamically allocates an array to cover those chips with a large number of gpios. Reviewed-and-tested-by: Lukas WunnerSigned-off-by: Lukas Wunner Signed-off-by: Laura Abbott --- v5: Dropped some outdated comments and extra whitespace. Switched to ARCH_NR_GPIOS per suggestion of Linus Walleij. --- drivers/gpio/gpiolib.c | 76 +-- drivers/gpio/gpiolib.h | 2 +- include/linux/gpio/consumer.h | 10 +++--- 3 files changed, 66 insertions(+), 22 deletions(-) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index d66de67ef307..79ec7a29b684 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -61,6 +61,11 @@ static struct bus_type gpio_bus_type = { .name = "gpio", }; +/* + * Number of GPIOs to use for the fast path in set array + */ +#define FASTPATH_NGPIO ARCH_NR_GPIOS Also wouldn't this mean that fast path will never be triggered now... Just to be clearer. That this will always be true. (chip->ngpio <= FASTPATH_NGPIO) + /* gpio_lock prevents conflicts during gpio_desc[] table updates. * While any GPIO is requested, its gpio_chip is not removable; * each GPIO's "requested" flag serves as a lock and refcount. @@ -399,12 +404,11 @@ static long linehandle_ioctl(struct file *filep, unsigned int cmd, vals[i] = !!ghd.values[i]; /* Reuse the array setting function */ - gpiod_set_array_value_complex(false, + return gpiod_set_array_value_complex(false, true, lh->numdescs, lh->descs, vals); - return 0; } return -EINVAL; } @@ -1192,6 +1196,10 @@ int gpiochip_add_data_with_key(struct gpio_chip *chip, void *data, goto err_free_descs; } + if (chip->ngpio > FASTPATH_NGPIO) + chip_warn(chip, "line cnt %d is greater than fast path cnt %d\n", + chip->ngpio, FASTPATH_NGPIO); + gdev->label = kstrdup_const(chip->label ?: "unknown", GFP_KERNEL); if (!gdev->label) { status = -ENOMEM; @@ -2662,16 +2670,28 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep, while (i < array_size) { struct gpio_chip *chip = desc_array[i]->gdev->chip; - unsigned long mask[BITS_TO_LONGS(chip->ngpio)]; - unsigned long bits[BITS_TO_LONGS(chip->ngpio)]; + unsigned long fastpath[2 * BITS_TO_LONGS(FASTPATH_NGPIO)]; + unsigned long *mask, *bits; int first, j, ret; + if (likely(chip->ngpio <= FASTPATH_NGPIO)) { + memset(fastpath, 0, sizeof(fastpath)); + mask = fastpath; + bits = fastpath + BITS_TO_LONGS(FASTPATH_NGPIO); Previously it looks like just mask was zeroed. So could this just be: memset(mask, 0, BITS_TO_LONGS(chip->ngpio)); I'm guessing it's not a huge additional overhead as it is, but it's more in line with what was there. + } else { + mask = kcalloc(2 * BITS_TO_LONGS(chip->ngpio), + sizeof(*mask), + can_sleep ? GFP_KERNEL : GFP_ATOMIC); + if (!mask) + return -ENOMEM; + bits = mask + BITS_TO_LONGS(chip->ngpio); + } + if (!can_sleep) WARN_ON(chip->can_sleep); /* collect all inputs belonging to the same chip */ first = i; - memset(mask, 0, sizeof(mask)); do { const struct gpio_desc *desc = desc_array[i]; int hwgpio = gpio_chip_hwgpio(desc); @@ -2682,8 +2702,11 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep, (desc_array[i]->gdev->chip == chip)); ret = gpio_chip_get_multiple(chip, mask, bits); - if (ret) + if (ret) { + if (mask != fastpath) + kfree(mask); return ret; + } for (j = first; j < i; j++) { const struct gpio_desc *desc = desc_array[j]; @@ -2695,6 +2718,9 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep, value_array[j] = value; trace_gpio_value(desc_to_gpio(desc), 1, value); } + + if (mask != fastpath) + kfree(mask); } return 0; } @@ -2878,7 +2904,7 @@ static void gpio_chip_set_multiple(struct gpio_chip *chip, } } -void
Re: [PATCHv5] gpio: Remove VLA from gpiolib
On 16/04/2018 13:19, Phil Reid wrote: G'day Laura, One more comment. On 16/04/2018 12:41, Phil Reid wrote: G'day Laura, On 14/04/2018 05:24, Laura Abbott wrote: The new challenge is to remove VLAs from the kernel (see https://lkml.org/lkml/2018/3/7/621) to eventually turn on -Wvla. Using a kmalloc array is the easy way to fix this but kmalloc is still more expensive than stack allocation. Introduce a fast path with a fixed size stack array to cover most chip with gpios below some fixed amount. The slow path dynamically allocates an array to cover those chips with a large number of gpios. Reviewed-and-tested-by: Lukas Wunner Signed-off-by: Lukas Wunner Signed-off-by: Laura Abbott --- v5: Dropped some outdated comments and extra whitespace. Switched to ARCH_NR_GPIOS per suggestion of Linus Walleij. --- drivers/gpio/gpiolib.c | 76 +-- drivers/gpio/gpiolib.h | 2 +- include/linux/gpio/consumer.h | 10 +++--- 3 files changed, 66 insertions(+), 22 deletions(-) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index d66de67ef307..79ec7a29b684 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -61,6 +61,11 @@ static struct bus_type gpio_bus_type = { .name = "gpio", }; +/* + * Number of GPIOs to use for the fast path in set array + */ +#define FASTPATH_NGPIO ARCH_NR_GPIOS Also wouldn't this mean that fast path will never be triggered now... Just to be clearer. That this will always be true. (chip->ngpio <= FASTPATH_NGPIO) + /* gpio_lock prevents conflicts during gpio_desc[] table updates. * While any GPIO is requested, its gpio_chip is not removable; * each GPIO's "requested" flag serves as a lock and refcount. @@ -399,12 +404,11 @@ static long linehandle_ioctl(struct file *filep, unsigned int cmd, vals[i] = !!ghd.values[i]; /* Reuse the array setting function */ - gpiod_set_array_value_complex(false, + return gpiod_set_array_value_complex(false, true, lh->numdescs, lh->descs, vals); - return 0; } return -EINVAL; } @@ -1192,6 +1196,10 @@ int gpiochip_add_data_with_key(struct gpio_chip *chip, void *data, goto err_free_descs; } + if (chip->ngpio > FASTPATH_NGPIO) + chip_warn(chip, "line cnt %d is greater than fast path cnt %d\n", + chip->ngpio, FASTPATH_NGPIO); + gdev->label = kstrdup_const(chip->label ?: "unknown", GFP_KERNEL); if (!gdev->label) { status = -ENOMEM; @@ -2662,16 +2670,28 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep, while (i < array_size) { struct gpio_chip *chip = desc_array[i]->gdev->chip; - unsigned long mask[BITS_TO_LONGS(chip->ngpio)]; - unsigned long bits[BITS_TO_LONGS(chip->ngpio)]; + unsigned long fastpath[2 * BITS_TO_LONGS(FASTPATH_NGPIO)]; + unsigned long *mask, *bits; int first, j, ret; + if (likely(chip->ngpio <= FASTPATH_NGPIO)) { + memset(fastpath, 0, sizeof(fastpath)); + mask = fastpath; + bits = fastpath + BITS_TO_LONGS(FASTPATH_NGPIO); Previously it looks like just mask was zeroed. So could this just be: memset(mask, 0, BITS_TO_LONGS(chip->ngpio)); I'm guessing it's not a huge additional overhead as it is, but it's more in line with what was there. + } else { + mask = kcalloc(2 * BITS_TO_LONGS(chip->ngpio), + sizeof(*mask), + can_sleep ? GFP_KERNEL : GFP_ATOMIC); + if (!mask) + return -ENOMEM; + bits = mask + BITS_TO_LONGS(chip->ngpio); + } + if (!can_sleep) WARN_ON(chip->can_sleep); /* collect all inputs belonging to the same chip */ first = i; - memset(mask, 0, sizeof(mask)); do { const struct gpio_desc *desc = desc_array[i]; int hwgpio = gpio_chip_hwgpio(desc); @@ -2682,8 +2702,11 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep, (desc_array[i]->gdev->chip == chip)); ret = gpio_chip_get_multiple(chip, mask, bits); - if (ret) + if (ret) { + if (mask != fastpath) + kfree(mask); return ret; + } for (j = first; j < i; j++) { const struct gpio_desc *desc = desc_array[j]; @@ -2695,6 +2718,9 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep, value_array[j] = value; trace_gpio_value(desc_to_gpio(desc), 1, value); } + + if (mask != fastpath) + kfree(mask); } return 0; } @@ -2878,7 +2904,7 @@ static void gpio_chip_set_multiple(struct gpio_chip *chip, } } -void gpiod_set_array_value_complex(bool raw, bool can_sleep, +int
[PATCH] mailbox: arm_mhu: add support for mhuv2
ARM has launched a next version of MHU i.e. MHUv2 with its latest subsystems. The main change is that the MHUv2 is now a distributed IP with different peripheral views (registers) for the sender and receiver. Another main difference is that MHUv1 duplex channels are now split into simplex/half duplex in MHUv2. MHUv2 has a configurable number of communication channels. There is a capability register (MSG_NO_CAP) to find out how many channels are available in a system. The register offsets have also changed for STAT, SET & CLEAR registers from 0x0, 0x8 & 0x10 in MHUv1 to 0x0, 0xC & 0x8 in MHUv2 respectively. 0x00x4 0x8 0xC 0x1F - | STAT ||| SET || | - Transmit Channel 0x00x4 0x8 0xC0x1F - | STAT || CLR ||| | - Receive Channel The MHU controller can request the receiver to wake-up and once the request is removed, the receiver may go back to sleep, but the MHU itself does not actively puts a receiver to sleep. So, in order to wake-up the receiver when the sender wants to send data, the sender has to set ACCESS_REQUEST register first in order to wake-up receiver, state of which can be detected using ACCESS_READY register. ACCESS_REQUEST has an offset of 0xF88 & ACCESS_READY has an offset of 0xF8C and are accessible only on any sender channel. This patch adds necessary changes required to support the older version of MHU & the latest MHUv2 controller. This patch also need an update in DT binding for ARM MHU as we need a second register base (tx base) which would be used as the send channel base. Signed-off-by: Samarth Parikh--- drivers/mailbox/arm_mhu.c | 163 ++ 1 file changed, 151 insertions(+), 12 deletions(-) diff --git a/drivers/mailbox/arm_mhu.c b/drivers/mailbox/arm_mhu.c index 99befa7..d8825c5 100644 --- a/drivers/mailbox/arm_mhu.c +++ b/drivers/mailbox/arm_mhu.c @@ -23,6 +23,8 @@ #include #include #include +#include +#include #define INTR_STAT_OFS 0x0 #define INTR_SET_OFS 0x8 @@ -33,12 +35,69 @@ #define MHU_SEC_OFFSET 0x200 #define TX_REG_OFFSET 0x100 +#define MHU_V2_REG_STAT_OFS0x0 +#define MHU_V2_REG_CLR_OFS 0x8 +#define MHU_V2_REG_SET_OFS 0xC +#define MHU_V2_REG_MSG_NO_CAP 0xF80 +#define MHU_V2_REG_ACC_REQ_OFS 0xF88 +#define MHU_V2_REG_ACC_RDY_OFS 0xF8C + +#define MHU_V2_LP_OFFSET 0x20 +#define MHU_V2_HP_OFFSET 0x0 + #define MHU_CHANS 3 +enum mhu_ver { + MHU_V1 = 1, + MHU_V2, + MHU_VER_END +}; + +enum mhu_regs { + MHU_REG_STAT, + MHU_REG_SET, + MHU_REG_CLR, + MHU_REG_END +}; + +enum mhu_access_regs { + MHU_REG_MSG_NO_CAP, + MHU_REG_ACC_REQ, + MHU_REG_ACC_RDY, + MHU_REG_ACC_END +}; + +enum mhu_channels { + MHU_CHAN_LOW, + MHU_CHAN_HIGH, + MHU_CHAN_SEC, + MHU_CHAN_END +}; + +/** + * ARM MHU Mailbox device specific data + * + * @regs: MHU version specific array of register offset for STAT, + *SET & CLEAR registers. + * @chans: MHU version specific array of channel offset for Low + * Priority, High Priority & Secure channels. + * @acc_regs: An array of access register offsets. + * @tx_reg_off: Offset for TX register. + * @version: Version of MHU controller available in the system. + */ +struct mhu_data { + int regs[MHU_REG_END]; /* STAT, SET, CLEAR */ + int chans[MHU_CHAN_END]; /* LP, HP, Sec */ + int acc_regs[MHU_REG_ACC_END]; + long int tx_reg_off; + uint8_t version; +}; + struct mhu_link { unsigned irq; void __iomem *tx_reg; void __iomem *rx_reg; + unsigned int pchan; }; struct arm_mhu { @@ -46,21 +105,24 @@ struct arm_mhu { struct mhu_link mlink[MHU_CHANS]; struct mbox_chan chan[MHU_CHANS]; struct mbox_controller mbox; + struct mhu_data *drvdata; }; static irqreturn_t mhu_rx_interrupt(int irq, void *p) { struct mbox_chan *chan = p; struct mhu_link *mlink = chan->con_priv; + struct arm_mhu *mhu = container_of(chan->mbox, struct arm_mhu, mbox); + struct mhu_data *mdata = mhu->drvdata; u32 val; - val = readl_relaxed(mlink->rx_reg + INTR_STAT_OFS); + val = readl_relaxed(mlink->rx_reg + mdata->regs[MHU_REG_STAT]); if (!val) return IRQ_NONE; mbox_chan_received_data(chan, (void *)); - writel_relaxed(val, mlink->rx_reg + INTR_CLR_OFS); + writel_relaxed(val, mlink->rx_reg + mdata->regs[MHU_REG_CLR]); return IRQ_HANDLED; } @@ -68,7 +130,9 @@ static irqreturn_t mhu_rx_interrupt(int irq, void *p) static bool mhu_last_tx_done(struct mbox_chan *chan) { struct mhu_link *mlink = chan->con_priv; - u32 val = readl_relaxed(mlink->tx_reg + INTR_STAT_OFS); + struct arm_mhu
[PATCH] mailbox: arm_mhu: add support for mhuv2
ARM has launched a next version of MHU i.e. MHUv2 with its latest subsystems. The main change is that the MHUv2 is now a distributed IP with different peripheral views (registers) for the sender and receiver. Another main difference is that MHUv1 duplex channels are now split into simplex/half duplex in MHUv2. MHUv2 has a configurable number of communication channels. There is a capability register (MSG_NO_CAP) to find out how many channels are available in a system. The register offsets have also changed for STAT, SET & CLEAR registers from 0x0, 0x8 & 0x10 in MHUv1 to 0x0, 0xC & 0x8 in MHUv2 respectively. 0x00x4 0x8 0xC 0x1F - | STAT ||| SET || | - Transmit Channel 0x00x4 0x8 0xC0x1F - | STAT || CLR ||| | - Receive Channel The MHU controller can request the receiver to wake-up and once the request is removed, the receiver may go back to sleep, but the MHU itself does not actively puts a receiver to sleep. So, in order to wake-up the receiver when the sender wants to send data, the sender has to set ACCESS_REQUEST register first in order to wake-up receiver, state of which can be detected using ACCESS_READY register. ACCESS_REQUEST has an offset of 0xF88 & ACCESS_READY has an offset of 0xF8C and are accessible only on any sender channel. This patch adds necessary changes required to support the older version of MHU & the latest MHUv2 controller. This patch also need an update in DT binding for ARM MHU as we need a second register base (tx base) which would be used as the send channel base. Signed-off-by: Samarth Parikh --- drivers/mailbox/arm_mhu.c | 163 ++ 1 file changed, 151 insertions(+), 12 deletions(-) diff --git a/drivers/mailbox/arm_mhu.c b/drivers/mailbox/arm_mhu.c index 99befa7..d8825c5 100644 --- a/drivers/mailbox/arm_mhu.c +++ b/drivers/mailbox/arm_mhu.c @@ -23,6 +23,8 @@ #include #include #include +#include +#include #define INTR_STAT_OFS 0x0 #define INTR_SET_OFS 0x8 @@ -33,12 +35,69 @@ #define MHU_SEC_OFFSET 0x200 #define TX_REG_OFFSET 0x100 +#define MHU_V2_REG_STAT_OFS0x0 +#define MHU_V2_REG_CLR_OFS 0x8 +#define MHU_V2_REG_SET_OFS 0xC +#define MHU_V2_REG_MSG_NO_CAP 0xF80 +#define MHU_V2_REG_ACC_REQ_OFS 0xF88 +#define MHU_V2_REG_ACC_RDY_OFS 0xF8C + +#define MHU_V2_LP_OFFSET 0x20 +#define MHU_V2_HP_OFFSET 0x0 + #define MHU_CHANS 3 +enum mhu_ver { + MHU_V1 = 1, + MHU_V2, + MHU_VER_END +}; + +enum mhu_regs { + MHU_REG_STAT, + MHU_REG_SET, + MHU_REG_CLR, + MHU_REG_END +}; + +enum mhu_access_regs { + MHU_REG_MSG_NO_CAP, + MHU_REG_ACC_REQ, + MHU_REG_ACC_RDY, + MHU_REG_ACC_END +}; + +enum mhu_channels { + MHU_CHAN_LOW, + MHU_CHAN_HIGH, + MHU_CHAN_SEC, + MHU_CHAN_END +}; + +/** + * ARM MHU Mailbox device specific data + * + * @regs: MHU version specific array of register offset for STAT, + *SET & CLEAR registers. + * @chans: MHU version specific array of channel offset for Low + * Priority, High Priority & Secure channels. + * @acc_regs: An array of access register offsets. + * @tx_reg_off: Offset for TX register. + * @version: Version of MHU controller available in the system. + */ +struct mhu_data { + int regs[MHU_REG_END]; /* STAT, SET, CLEAR */ + int chans[MHU_CHAN_END]; /* LP, HP, Sec */ + int acc_regs[MHU_REG_ACC_END]; + long int tx_reg_off; + uint8_t version; +}; + struct mhu_link { unsigned irq; void __iomem *tx_reg; void __iomem *rx_reg; + unsigned int pchan; }; struct arm_mhu { @@ -46,21 +105,24 @@ struct arm_mhu { struct mhu_link mlink[MHU_CHANS]; struct mbox_chan chan[MHU_CHANS]; struct mbox_controller mbox; + struct mhu_data *drvdata; }; static irqreturn_t mhu_rx_interrupt(int irq, void *p) { struct mbox_chan *chan = p; struct mhu_link *mlink = chan->con_priv; + struct arm_mhu *mhu = container_of(chan->mbox, struct arm_mhu, mbox); + struct mhu_data *mdata = mhu->drvdata; u32 val; - val = readl_relaxed(mlink->rx_reg + INTR_STAT_OFS); + val = readl_relaxed(mlink->rx_reg + mdata->regs[MHU_REG_STAT]); if (!val) return IRQ_NONE; mbox_chan_received_data(chan, (void *)); - writel_relaxed(val, mlink->rx_reg + INTR_CLR_OFS); + writel_relaxed(val, mlink->rx_reg + mdata->regs[MHU_REG_CLR]); return IRQ_HANDLED; } @@ -68,7 +130,9 @@ static irqreturn_t mhu_rx_interrupt(int irq, void *p) static bool mhu_last_tx_done(struct mbox_chan *chan) { struct mhu_link *mlink = chan->con_priv; - u32 val = readl_relaxed(mlink->tx_reg + INTR_STAT_OFS); + struct arm_mhu *mhu =
Re: [PATCHv5] gpio: Remove VLA from gpiolib
G'day Laura, One more comment. On 16/04/2018 12:41, Phil Reid wrote: G'day Laura, On 14/04/2018 05:24, Laura Abbott wrote: The new challenge is to remove VLAs from the kernel (see https://lkml.org/lkml/2018/3/7/621) to eventually turn on -Wvla. Using a kmalloc array is the easy way to fix this but kmalloc is still more expensive than stack allocation. Introduce a fast path with a fixed size stack array to cover most chip with gpios below some fixed amount. The slow path dynamically allocates an array to cover those chips with a large number of gpios. Reviewed-and-tested-by: Lukas WunnerSigned-off-by: Lukas Wunner Signed-off-by: Laura Abbott --- v5: Dropped some outdated comments and extra whitespace. Switched to ARCH_NR_GPIOS per suggestion of Linus Walleij. --- drivers/gpio/gpiolib.c | 76 +-- drivers/gpio/gpiolib.h | 2 +- include/linux/gpio/consumer.h | 10 +++--- 3 files changed, 66 insertions(+), 22 deletions(-) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index d66de67ef307..79ec7a29b684 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -61,6 +61,11 @@ static struct bus_type gpio_bus_type = { .name = "gpio", }; +/* + * Number of GPIOs to use for the fast path in set array + */ +#define FASTPATH_NGPIO ARCH_NR_GPIOS Also wouldn't this mean that fast path will never be triggered now... + /* gpio_lock prevents conflicts during gpio_desc[] table updates. * While any GPIO is requested, its gpio_chip is not removable; * each GPIO's "requested" flag serves as a lock and refcount. @@ -399,12 +404,11 @@ static long linehandle_ioctl(struct file *filep, unsigned int cmd, vals[i] = !!ghd.values[i]; /* Reuse the array setting function */ - gpiod_set_array_value_complex(false, + return gpiod_set_array_value_complex(false, true, lh->numdescs, lh->descs, vals); - return 0; } return -EINVAL; } @@ -1192,6 +1196,10 @@ int gpiochip_add_data_with_key(struct gpio_chip *chip, void *data, goto err_free_descs; } + if (chip->ngpio > FASTPATH_NGPIO) + chip_warn(chip, "line cnt %d is greater than fast path cnt %d\n", + chip->ngpio, FASTPATH_NGPIO); + gdev->label = kstrdup_const(chip->label ?: "unknown", GFP_KERNEL); if (!gdev->label) { status = -ENOMEM; @@ -2662,16 +2670,28 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep, while (i < array_size) { struct gpio_chip *chip = desc_array[i]->gdev->chip; - unsigned long mask[BITS_TO_LONGS(chip->ngpio)]; - unsigned long bits[BITS_TO_LONGS(chip->ngpio)]; + unsigned long fastpath[2 * BITS_TO_LONGS(FASTPATH_NGPIO)]; + unsigned long *mask, *bits; int first, j, ret; + if (likely(chip->ngpio <= FASTPATH_NGPIO)) { + memset(fastpath, 0, sizeof(fastpath)); + mask = fastpath; + bits = fastpath + BITS_TO_LONGS(FASTPATH_NGPIO); Previously it looks like just mask was zeroed. So could this just be: memset(mask, 0, BITS_TO_LONGS(chip->ngpio)); I'm guessing it's not a huge additional overhead as it is, but it's more in line with what was there. + } else { + mask = kcalloc(2 * BITS_TO_LONGS(chip->ngpio), + sizeof(*mask), + can_sleep ? GFP_KERNEL : GFP_ATOMIC); + if (!mask) + return -ENOMEM; + bits = mask + BITS_TO_LONGS(chip->ngpio); + } + if (!can_sleep) WARN_ON(chip->can_sleep); /* collect all inputs belonging to the same chip */ first = i; - memset(mask, 0, sizeof(mask)); do { const struct gpio_desc *desc = desc_array[i]; int hwgpio = gpio_chip_hwgpio(desc); @@ -2682,8 +2702,11 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep, (desc_array[i]->gdev->chip == chip)); ret = gpio_chip_get_multiple(chip, mask, bits); - if (ret) + if (ret) { + if (mask != fastpath) + kfree(mask); return ret; + } for (j = first; j < i; j++) { const struct gpio_desc *desc = desc_array[j]; @@ -2695,6 +2718,9 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep, value_array[j] = value; trace_gpio_value(desc_to_gpio(desc), 1, value); } + + if (mask != fastpath) + kfree(mask); } return 0; } @@ -2878,7 +2904,7 @@ static void gpio_chip_set_multiple(struct gpio_chip *chip, } } -void gpiod_set_array_value_complex(bool raw, bool can_sleep, +int gpiod_set_array_value_complex(bool raw, bool can_sleep,
Re: [PATCHv5] gpio: Remove VLA from gpiolib
G'day Laura, One more comment. On 16/04/2018 12:41, Phil Reid wrote: G'day Laura, On 14/04/2018 05:24, Laura Abbott wrote: The new challenge is to remove VLAs from the kernel (see https://lkml.org/lkml/2018/3/7/621) to eventually turn on -Wvla. Using a kmalloc array is the easy way to fix this but kmalloc is still more expensive than stack allocation. Introduce a fast path with a fixed size stack array to cover most chip with gpios below some fixed amount. The slow path dynamically allocates an array to cover those chips with a large number of gpios. Reviewed-and-tested-by: Lukas Wunner Signed-off-by: Lukas Wunner Signed-off-by: Laura Abbott --- v5: Dropped some outdated comments and extra whitespace. Switched to ARCH_NR_GPIOS per suggestion of Linus Walleij. --- drivers/gpio/gpiolib.c | 76 +-- drivers/gpio/gpiolib.h | 2 +- include/linux/gpio/consumer.h | 10 +++--- 3 files changed, 66 insertions(+), 22 deletions(-) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index d66de67ef307..79ec7a29b684 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -61,6 +61,11 @@ static struct bus_type gpio_bus_type = { .name = "gpio", }; +/* + * Number of GPIOs to use for the fast path in set array + */ +#define FASTPATH_NGPIO ARCH_NR_GPIOS Also wouldn't this mean that fast path will never be triggered now... + /* gpio_lock prevents conflicts during gpio_desc[] table updates. * While any GPIO is requested, its gpio_chip is not removable; * each GPIO's "requested" flag serves as a lock and refcount. @@ -399,12 +404,11 @@ static long linehandle_ioctl(struct file *filep, unsigned int cmd, vals[i] = !!ghd.values[i]; /* Reuse the array setting function */ - gpiod_set_array_value_complex(false, + return gpiod_set_array_value_complex(false, true, lh->numdescs, lh->descs, vals); - return 0; } return -EINVAL; } @@ -1192,6 +1196,10 @@ int gpiochip_add_data_with_key(struct gpio_chip *chip, void *data, goto err_free_descs; } + if (chip->ngpio > FASTPATH_NGPIO) + chip_warn(chip, "line cnt %d is greater than fast path cnt %d\n", + chip->ngpio, FASTPATH_NGPIO); + gdev->label = kstrdup_const(chip->label ?: "unknown", GFP_KERNEL); if (!gdev->label) { status = -ENOMEM; @@ -2662,16 +2670,28 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep, while (i < array_size) { struct gpio_chip *chip = desc_array[i]->gdev->chip; - unsigned long mask[BITS_TO_LONGS(chip->ngpio)]; - unsigned long bits[BITS_TO_LONGS(chip->ngpio)]; + unsigned long fastpath[2 * BITS_TO_LONGS(FASTPATH_NGPIO)]; + unsigned long *mask, *bits; int first, j, ret; + if (likely(chip->ngpio <= FASTPATH_NGPIO)) { + memset(fastpath, 0, sizeof(fastpath)); + mask = fastpath; + bits = fastpath + BITS_TO_LONGS(FASTPATH_NGPIO); Previously it looks like just mask was zeroed. So could this just be: memset(mask, 0, BITS_TO_LONGS(chip->ngpio)); I'm guessing it's not a huge additional overhead as it is, but it's more in line with what was there. + } else { + mask = kcalloc(2 * BITS_TO_LONGS(chip->ngpio), + sizeof(*mask), + can_sleep ? GFP_KERNEL : GFP_ATOMIC); + if (!mask) + return -ENOMEM; + bits = mask + BITS_TO_LONGS(chip->ngpio); + } + if (!can_sleep) WARN_ON(chip->can_sleep); /* collect all inputs belonging to the same chip */ first = i; - memset(mask, 0, sizeof(mask)); do { const struct gpio_desc *desc = desc_array[i]; int hwgpio = gpio_chip_hwgpio(desc); @@ -2682,8 +2702,11 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep, (desc_array[i]->gdev->chip == chip)); ret = gpio_chip_get_multiple(chip, mask, bits); - if (ret) + if (ret) { + if (mask != fastpath) + kfree(mask); return ret; + } for (j = first; j < i; j++) { const struct gpio_desc *desc = desc_array[j]; @@ -2695,6 +2718,9 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep, value_array[j] = value; trace_gpio_value(desc_to_gpio(desc), 1, value); } + + if (mask != fastpath) + kfree(mask); } return 0; } @@ -2878,7 +2904,7 @@ static void gpio_chip_set_multiple(struct gpio_chip *chip, } } -void gpiod_set_array_value_complex(bool raw, bool can_sleep, +int gpiod_set_array_value_complex(bool raw, bool can_sleep, unsigned int array_size,
Re: [PATCH v4 08/15] KVM: s390: interfaces to (de)configure guest's AP matrix
Hi Tony, I love your patch! Yet something to improve: [auto build test ERROR on s390/features] [also build test ERROR on v4.17-rc1 next-20180413] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Tony-Krowiak/s390-vfio-ap-guest-dedicated-crypto-adapters/20180416-052759 base: https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features config: s390-alldefconfig (attached as .config) compiler: s390x-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=s390 All errors (new ones prefixed by >>): arch/s390/kvm/kvm-ap.o: In function `kvm_ap_matrix_create': >> kvm-ap.c:(.text+0x176): undefined reference to `ap_query_configuration' --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH v4 08/15] KVM: s390: interfaces to (de)configure guest's AP matrix
Hi Tony, I love your patch! Yet something to improve: [auto build test ERROR on s390/features] [also build test ERROR on v4.17-rc1 next-20180413] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Tony-Krowiak/s390-vfio-ap-guest-dedicated-crypto-adapters/20180416-052759 base: https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features config: s390-alldefconfig (attached as .config) compiler: s390x-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=s390 All errors (new ones prefixed by >>): arch/s390/kvm/kvm-ap.o: In function `kvm_ap_matrix_create': >> kvm-ap.c:(.text+0x176): undefined reference to `ap_query_configuration' --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
nds32 build failures
I thought I should give the brand new architecture a try. Unfortunately, that was not very successful. Build reference: v4.17-rc1 gcc version: nds32le-elf-gcc (GCC) 7.3.0 Building nds32:defconfig ... failed arch/nds32/include/asm/nds32.h: In function 'GIE_ENABLE': arch/nds32/include/asm/nds32.h:25:2: error: implicit declaration of function '__nds32__gie_en'; did you mean '__nds32__'? arch/nds32/include/asm/nds32.h: In function 'CACHE_SET': arch/nds32/include/asm/nds32.h:38:18: error: implicit declaration of function '__nds32__mfsr'; did you mean '__nds32__'? arch/nds32/include/asm/nds32.h:38:32: error: 'NDS32_SR_ICM_CFG' undeclared arch/nds32/include/asm/nds32.h:41:32: error: 'NDS32_SR_DCM_CFG' Am I missing something ? Guenter
nds32 build failures
I thought I should give the brand new architecture a try. Unfortunately, that was not very successful. Build reference: v4.17-rc1 gcc version: nds32le-elf-gcc (GCC) 7.3.0 Building nds32:defconfig ... failed arch/nds32/include/asm/nds32.h: In function 'GIE_ENABLE': arch/nds32/include/asm/nds32.h:25:2: error: implicit declaration of function '__nds32__gie_en'; did you mean '__nds32__'? arch/nds32/include/asm/nds32.h: In function 'CACHE_SET': arch/nds32/include/asm/nds32.h:38:18: error: implicit declaration of function '__nds32__mfsr'; did you mean '__nds32__'? arch/nds32/include/asm/nds32.h:38:32: error: 'NDS32_SR_ICM_CFG' undeclared arch/nds32/include/asm/nds32.h:41:32: error: 'NDS32_SR_DCM_CFG' Am I missing something ? Guenter
Re: [linux-sunxi] [PATCH v2 00/10] Allwinner H3 DVFS support
On Mon, Apr 16, 2018 at 12:41 PM, Chen-Yu Tsaiwrote: > Hi, > > On Tue, Feb 6, 2018 at 12:48 PM, Icenowy Zheng wrote: >> This patchset tries to add DVFS support for Allwinner H3 SoC, >> considering two kinds of adjustable regulators used on H3 boards: >> SY8106A I2C-controlled regulator and SY8113B regulator (controllable >> by GPIO with some special designs on the board), and also taking the >> uncontrollable boards into consider. >> >> PATCH 1 and PATCH 2 are for the SY8106A regulator, then PATCH 3 and >> PATCH 4 are for the r_i2c bus, which is used by boards with SY8106A >> to communicate with the regulator. >> >> PATCH 5 adds the operating points v2 table to the H3 SoC, but with >> OPPs higher than 1008MHz temporarily dropped. >> >> Then there's patches for several tested boards: Orange Pi PC (with >> SY8106A), Orange Pi One/Zero (with GPIO-adjustable SY8113B) and >> ALL-H3-CC (unadjustable). >> >> Icenowy Zheng (5): >> ARM: sun8i: h3: add operating-points-v2 table for CPU >> ARM: sun8i: h2+: add SY8113B regulator used by Orange Pi Zero board >> ARM: sun8i: h3: add SY8113B regulator used by Orange Pi One board >> ARM: sun8i: h3: fix ALL-H3-CC H3 ver VDD-CPUX voltage >> ARM: sun8i: h3: set the cpu-supply to VDD-CPUX on ALL-H3-CC H3 ver >> >> Ondrej Jirman (5): >> dt-bindings: add binding for the SY8106A voltage regulator >> regulator: add support for SY8106A regulator >> ARM: sunxi: h3/h5: Add r_i2c pinmux node >> ARM: sunxi: h3/h5: Add r_i2c I2C controller >> ARM: sun8i: h3: Add SY8106A regulator to Orange Pi PC > > I've applied all the device tree patches for 4.18, taking into account > comments from Maxime. See > > > https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux.git/log/?h=sunxi/h3-h5-for-4.17 I meant https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux.git/log/?h=sunxi/h3-h5-for-4.18 of course... > Mostly it's just renaming the regulator node names and labels. > > Please resend the first two patches to Mark Brown, the regulator > subsystem maintainer. And you might want to mention the branch > above in case he needs a use case reference. > > Regards > ChenYu
Re: [linux-sunxi] [PATCH v2 00/10] Allwinner H3 DVFS support
On Mon, Apr 16, 2018 at 12:41 PM, Chen-Yu Tsai wrote: > Hi, > > On Tue, Feb 6, 2018 at 12:48 PM, Icenowy Zheng wrote: >> This patchset tries to add DVFS support for Allwinner H3 SoC, >> considering two kinds of adjustable regulators used on H3 boards: >> SY8106A I2C-controlled regulator and SY8113B regulator (controllable >> by GPIO with some special designs on the board), and also taking the >> uncontrollable boards into consider. >> >> PATCH 1 and PATCH 2 are for the SY8106A regulator, then PATCH 3 and >> PATCH 4 are for the r_i2c bus, which is used by boards with SY8106A >> to communicate with the regulator. >> >> PATCH 5 adds the operating points v2 table to the H3 SoC, but with >> OPPs higher than 1008MHz temporarily dropped. >> >> Then there's patches for several tested boards: Orange Pi PC (with >> SY8106A), Orange Pi One/Zero (with GPIO-adjustable SY8113B) and >> ALL-H3-CC (unadjustable). >> >> Icenowy Zheng (5): >> ARM: sun8i: h3: add operating-points-v2 table for CPU >> ARM: sun8i: h2+: add SY8113B regulator used by Orange Pi Zero board >> ARM: sun8i: h3: add SY8113B regulator used by Orange Pi One board >> ARM: sun8i: h3: fix ALL-H3-CC H3 ver VDD-CPUX voltage >> ARM: sun8i: h3: set the cpu-supply to VDD-CPUX on ALL-H3-CC H3 ver >> >> Ondrej Jirman (5): >> dt-bindings: add binding for the SY8106A voltage regulator >> regulator: add support for SY8106A regulator >> ARM: sunxi: h3/h5: Add r_i2c pinmux node >> ARM: sunxi: h3/h5: Add r_i2c I2C controller >> ARM: sun8i: h3: Add SY8106A regulator to Orange Pi PC > > I've applied all the device tree patches for 4.18, taking into account > comments from Maxime. See > > > https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux.git/log/?h=sunxi/h3-h5-for-4.17 I meant https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux.git/log/?h=sunxi/h3-h5-for-4.18 of course... > Mostly it's just renaming the regulator node names and labels. > > Please resend the first two patches to Mark Brown, the regulator > subsystem maintainer. And you might want to mention the branch > above in case he needs a use case reference. > > Regards > ChenYu
Re: [PATCHv5] gpio: Remove VLA from gpiolib
G'day Laura, On 14/04/2018 05:24, Laura Abbott wrote: The new challenge is to remove VLAs from the kernel (see https://lkml.org/lkml/2018/3/7/621) to eventually turn on -Wvla. Using a kmalloc array is the easy way to fix this but kmalloc is still more expensive than stack allocation. Introduce a fast path with a fixed size stack array to cover most chip with gpios below some fixed amount. The slow path dynamically allocates an array to cover those chips with a large number of gpios. Reviewed-and-tested-by: Lukas WunnerSigned-off-by: Lukas Wunner Signed-off-by: Laura Abbott --- v5: Dropped some outdated comments and extra whitespace. Switched to ARCH_NR_GPIOS per suggestion of Linus Walleij. --- drivers/gpio/gpiolib.c| 76 +-- drivers/gpio/gpiolib.h| 2 +- include/linux/gpio/consumer.h | 10 +++--- 3 files changed, 66 insertions(+), 22 deletions(-) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index d66de67ef307..79ec7a29b684 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -61,6 +61,11 @@ static struct bus_type gpio_bus_type = { .name = "gpio", }; +/* + * Number of GPIOs to use for the fast path in set array + */ +#define FASTPATH_NGPIO ARCH_NR_GPIOS + /* gpio_lock prevents conflicts during gpio_desc[] table updates. * While any GPIO is requested, its gpio_chip is not removable; * each GPIO's "requested" flag serves as a lock and refcount. @@ -399,12 +404,11 @@ static long linehandle_ioctl(struct file *filep, unsigned int cmd, vals[i] = !!ghd.values[i]; /* Reuse the array setting function */ - gpiod_set_array_value_complex(false, + return gpiod_set_array_value_complex(false, true, lh->numdescs, lh->descs, vals); - return 0; } return -EINVAL; } @@ -1192,6 +1196,10 @@ int gpiochip_add_data_with_key(struct gpio_chip *chip, void *data, goto err_free_descs; } + if (chip->ngpio > FASTPATH_NGPIO) + chip_warn(chip, "line cnt %d is greater than fast path cnt %d\n", + chip->ngpio, FASTPATH_NGPIO); + gdev->label = kstrdup_const(chip->label ?: "unknown", GFP_KERNEL); if (!gdev->label) { status = -ENOMEM; @@ -2662,16 +2670,28 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep, while (i < array_size) { struct gpio_chip *chip = desc_array[i]->gdev->chip; - unsigned long mask[BITS_TO_LONGS(chip->ngpio)]; - unsigned long bits[BITS_TO_LONGS(chip->ngpio)]; + unsigned long fastpath[2 * BITS_TO_LONGS(FASTPATH_NGPIO)]; + unsigned long *mask, *bits; int first, j, ret; + if (likely(chip->ngpio <= FASTPATH_NGPIO)) { + memset(fastpath, 0, sizeof(fastpath)); + mask = fastpath; + bits = fastpath + BITS_TO_LONGS(FASTPATH_NGPIO); Previously it looks like just mask was zeroed. So could this just be: memset(mask, 0, BITS_TO_LONGS(chip->ngpio)); I'm guessing it's not a huge additional overhead as it is, but it's more in line with what was there. + } else { + mask = kcalloc(2 * BITS_TO_LONGS(chip->ngpio), + sizeof(*mask), + can_sleep ? GFP_KERNEL : GFP_ATOMIC); + if (!mask) + return -ENOMEM; + bits = mask + BITS_TO_LONGS(chip->ngpio); + } + if (!can_sleep) WARN_ON(chip->can_sleep); /* collect all inputs belonging to the same chip */ first = i; - memset(mask, 0, sizeof(mask)); do { const struct gpio_desc *desc = desc_array[i]; int hwgpio = gpio_chip_hwgpio(desc); @@ -2682,8 +2702,11 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep, (desc_array[i]->gdev->chip == chip)); ret = gpio_chip_get_multiple(chip, mask, bits); - if (ret) + if (ret) { + if (mask != fastpath) + kfree(mask); return ret; + } for (j = first; j < i; j++) { const struct gpio_desc *desc = desc_array[j]; @@ -2695,6 +2718,9 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep, value_array[j] = value; trace_gpio_value(desc_to_gpio(desc), 1, value); }
Re: [PATCHv5] gpio: Remove VLA from gpiolib
G'day Laura, On 14/04/2018 05:24, Laura Abbott wrote: The new challenge is to remove VLAs from the kernel (see https://lkml.org/lkml/2018/3/7/621) to eventually turn on -Wvla. Using a kmalloc array is the easy way to fix this but kmalloc is still more expensive than stack allocation. Introduce a fast path with a fixed size stack array to cover most chip with gpios below some fixed amount. The slow path dynamically allocates an array to cover those chips with a large number of gpios. Reviewed-and-tested-by: Lukas Wunner Signed-off-by: Lukas Wunner Signed-off-by: Laura Abbott --- v5: Dropped some outdated comments and extra whitespace. Switched to ARCH_NR_GPIOS per suggestion of Linus Walleij. --- drivers/gpio/gpiolib.c| 76 +-- drivers/gpio/gpiolib.h| 2 +- include/linux/gpio/consumer.h | 10 +++--- 3 files changed, 66 insertions(+), 22 deletions(-) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index d66de67ef307..79ec7a29b684 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -61,6 +61,11 @@ static struct bus_type gpio_bus_type = { .name = "gpio", }; +/* + * Number of GPIOs to use for the fast path in set array + */ +#define FASTPATH_NGPIO ARCH_NR_GPIOS + /* gpio_lock prevents conflicts during gpio_desc[] table updates. * While any GPIO is requested, its gpio_chip is not removable; * each GPIO's "requested" flag serves as a lock and refcount. @@ -399,12 +404,11 @@ static long linehandle_ioctl(struct file *filep, unsigned int cmd, vals[i] = !!ghd.values[i]; /* Reuse the array setting function */ - gpiod_set_array_value_complex(false, + return gpiod_set_array_value_complex(false, true, lh->numdescs, lh->descs, vals); - return 0; } return -EINVAL; } @@ -1192,6 +1196,10 @@ int gpiochip_add_data_with_key(struct gpio_chip *chip, void *data, goto err_free_descs; } + if (chip->ngpio > FASTPATH_NGPIO) + chip_warn(chip, "line cnt %d is greater than fast path cnt %d\n", + chip->ngpio, FASTPATH_NGPIO); + gdev->label = kstrdup_const(chip->label ?: "unknown", GFP_KERNEL); if (!gdev->label) { status = -ENOMEM; @@ -2662,16 +2670,28 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep, while (i < array_size) { struct gpio_chip *chip = desc_array[i]->gdev->chip; - unsigned long mask[BITS_TO_LONGS(chip->ngpio)]; - unsigned long bits[BITS_TO_LONGS(chip->ngpio)]; + unsigned long fastpath[2 * BITS_TO_LONGS(FASTPATH_NGPIO)]; + unsigned long *mask, *bits; int first, j, ret; + if (likely(chip->ngpio <= FASTPATH_NGPIO)) { + memset(fastpath, 0, sizeof(fastpath)); + mask = fastpath; + bits = fastpath + BITS_TO_LONGS(FASTPATH_NGPIO); Previously it looks like just mask was zeroed. So could this just be: memset(mask, 0, BITS_TO_LONGS(chip->ngpio)); I'm guessing it's not a huge additional overhead as it is, but it's more in line with what was there. + } else { + mask = kcalloc(2 * BITS_TO_LONGS(chip->ngpio), + sizeof(*mask), + can_sleep ? GFP_KERNEL : GFP_ATOMIC); + if (!mask) + return -ENOMEM; + bits = mask + BITS_TO_LONGS(chip->ngpio); + } + if (!can_sleep) WARN_ON(chip->can_sleep); /* collect all inputs belonging to the same chip */ first = i; - memset(mask, 0, sizeof(mask)); do { const struct gpio_desc *desc = desc_array[i]; int hwgpio = gpio_chip_hwgpio(desc); @@ -2682,8 +2702,11 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep, (desc_array[i]->gdev->chip == chip)); ret = gpio_chip_get_multiple(chip, mask, bits); - if (ret) + if (ret) { + if (mask != fastpath) + kfree(mask); return ret; + } for (j = first; j < i; j++) { const struct gpio_desc *desc = desc_array[j]; @@ -2695,6 +2718,9 @@ int gpiod_get_array_value_complex(bool raw, bool can_sleep, value_array[j] = value; trace_gpio_value(desc_to_gpio(desc), 1, value); } + + if (mask != fastpath) +
Re: [linux-sunxi] [PATCH v2 00/10] Allwinner H3 DVFS support
Hi, On Tue, Feb 6, 2018 at 12:48 PM, Icenowy Zhengwrote: > This patchset tries to add DVFS support for Allwinner H3 SoC, > considering two kinds of adjustable regulators used on H3 boards: > SY8106A I2C-controlled regulator and SY8113B regulator (controllable > by GPIO with some special designs on the board), and also taking the > uncontrollable boards into consider. > > PATCH 1 and PATCH 2 are for the SY8106A regulator, then PATCH 3 and > PATCH 4 are for the r_i2c bus, which is used by boards with SY8106A > to communicate with the regulator. > > PATCH 5 adds the operating points v2 table to the H3 SoC, but with > OPPs higher than 1008MHz temporarily dropped. > > Then there's patches for several tested boards: Orange Pi PC (with > SY8106A), Orange Pi One/Zero (with GPIO-adjustable SY8113B) and > ALL-H3-CC (unadjustable). > > Icenowy Zheng (5): > ARM: sun8i: h3: add operating-points-v2 table for CPU > ARM: sun8i: h2+: add SY8113B regulator used by Orange Pi Zero board > ARM: sun8i: h3: add SY8113B regulator used by Orange Pi One board > ARM: sun8i: h3: fix ALL-H3-CC H3 ver VDD-CPUX voltage > ARM: sun8i: h3: set the cpu-supply to VDD-CPUX on ALL-H3-CC H3 ver > > Ondrej Jirman (5): > dt-bindings: add binding for the SY8106A voltage regulator > regulator: add support for SY8106A regulator > ARM: sunxi: h3/h5: Add r_i2c pinmux node > ARM: sunxi: h3/h5: Add r_i2c I2C controller > ARM: sun8i: h3: Add SY8106A regulator to Orange Pi PC I've applied all the device tree patches for 4.18, taking into account comments from Maxime. See https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux.git/log/?h=sunxi/h3-h5-for-4.17 Mostly it's just renaming the regulator node names and labels. Please resend the first two patches to Mark Brown, the regulator subsystem maintainer. And you might want to mention the branch above in case he needs a use case reference. Regards ChenYu
Re: [linux-sunxi] [PATCH v2 00/10] Allwinner H3 DVFS support
Hi, On Tue, Feb 6, 2018 at 12:48 PM, Icenowy Zheng wrote: > This patchset tries to add DVFS support for Allwinner H3 SoC, > considering two kinds of adjustable regulators used on H3 boards: > SY8106A I2C-controlled regulator and SY8113B regulator (controllable > by GPIO with some special designs on the board), and also taking the > uncontrollable boards into consider. > > PATCH 1 and PATCH 2 are for the SY8106A regulator, then PATCH 3 and > PATCH 4 are for the r_i2c bus, which is used by boards with SY8106A > to communicate with the regulator. > > PATCH 5 adds the operating points v2 table to the H3 SoC, but with > OPPs higher than 1008MHz temporarily dropped. > > Then there's patches for several tested boards: Orange Pi PC (with > SY8106A), Orange Pi One/Zero (with GPIO-adjustable SY8113B) and > ALL-H3-CC (unadjustable). > > Icenowy Zheng (5): > ARM: sun8i: h3: add operating-points-v2 table for CPU > ARM: sun8i: h2+: add SY8113B regulator used by Orange Pi Zero board > ARM: sun8i: h3: add SY8113B regulator used by Orange Pi One board > ARM: sun8i: h3: fix ALL-H3-CC H3 ver VDD-CPUX voltage > ARM: sun8i: h3: set the cpu-supply to VDD-CPUX on ALL-H3-CC H3 ver > > Ondrej Jirman (5): > dt-bindings: add binding for the SY8106A voltage regulator > regulator: add support for SY8106A regulator > ARM: sunxi: h3/h5: Add r_i2c pinmux node > ARM: sunxi: h3/h5: Add r_i2c I2C controller > ARM: sun8i: h3: Add SY8106A regulator to Orange Pi PC I've applied all the device tree patches for 4.18, taking into account comments from Maxime. See https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux.git/log/?h=sunxi/h3-h5-for-4.17 Mostly it's just renaming the regulator node names and labels. Please resend the first two patches to Mark Brown, the regulator subsystem maintainer. And you might want to mention the branch above in case he needs a use case reference. Regards ChenYu
Re: [PATCH v3 4/4] mm/sparse: Optimize memmap allocation during sparse_init()
On 04/14/2018 07:19 PM, Baoquan He wrote: >>> Yes, this place is the hardest to understand. The temorary arrays are >>> allocated beforehand with the size of 'nr_present_sections'. The error >>> paths you mentioned is caused by allocation failure of mem_map or >>> map_map, but whatever it's error or success paths, the sections must be >>> marked as present in memory_present(). Error or success paths happened >>> in alloc_usemap_and_memmap(), while checking if it's erorr or success >>> paths happened in the last for_each_present_section_nr() of >>> sparse_init(), and clear the ms->section_mem_map if it goes along error >>> paths. This is the key point of this new allocation way. >> I think you owe some commenting because this is so hard to understand. > I can arrange and write a code comment above sparse_init() according to > this patch's git log, do you think it's OK? > > Honestly, it took me several days to write code, while I spent more > than one week to write the patch log. Writing patch log is really a > headache to me. I often find the same: writing the code is the easy part. Explaining why it is right is the hard part.
Re: [PATCH v3 4/4] mm/sparse: Optimize memmap allocation during sparse_init()
On 04/14/2018 07:19 PM, Baoquan He wrote: >>> Yes, this place is the hardest to understand. The temorary arrays are >>> allocated beforehand with the size of 'nr_present_sections'. The error >>> paths you mentioned is caused by allocation failure of mem_map or >>> map_map, but whatever it's error or success paths, the sections must be >>> marked as present in memory_present(). Error or success paths happened >>> in alloc_usemap_and_memmap(), while checking if it's erorr or success >>> paths happened in the last for_each_present_section_nr() of >>> sparse_init(), and clear the ms->section_mem_map if it goes along error >>> paths. This is the key point of this new allocation way. >> I think you owe some commenting because this is so hard to understand. > I can arrange and write a code comment above sparse_init() according to > this patch's git log, do you think it's OK? > > Honestly, it took me several days to write code, while I spent more > than one week to write the patch log. Writing patch log is really a > headache to me. I often find the same: writing the code is the easy part. Explaining why it is right is the hard part.
Re: [PATCH v5 1/3] regulator: axp20x: add drivevbus support for axp803
On Thu, Apr 5, 2018 at 2:46 PM, Maxime Ripardwrote: > On Thu, Apr 05, 2018 at 12:11:39PM +0530, Jagan Teki wrote: >> On Tue, Mar 27, 2018 at 11:01 AM, Jagan Teki >> wrote: >> > Like axp221, axp223, axp813 the axp803 is also supporting external >> > regulator to drive the OTG VBus through N_VBUSEN PMIC pin. >> > >> > Add support for it. >> > >> > Signed-off-by: Jagan Teki >> > Reviewed-by: Rob Herring >> > Reviewed-by: Chen-Yu Tsai >> > --- >> > Changes for v5: >> > - Collect Chen-Yu reviewed-by tag >> > Changes for v4: >> > - rebase on master >> > Changes for v3: >> > - Update drivevbus in table of regulators >> >> Can you pick these, has some dependency with drivevbus on other >> patches. > > I'm not the regulator maintainer, nor the AXP maintainer for that > matter. Mark Brown and Chen-Yu are, respectively. I've already reviewed all the patches. Please resend the series and include Mark Brown, the regulator subsystem maintainer. He clearly isn't in the current recipient list, so no wonder things didn't move forward. Once he applies the driver bits, we'll apply any pending device tree changes. ChenYu
Re: [PATCH v5 1/3] regulator: axp20x: add drivevbus support for axp803
On Thu, Apr 5, 2018 at 2:46 PM, Maxime Ripard wrote: > On Thu, Apr 05, 2018 at 12:11:39PM +0530, Jagan Teki wrote: >> On Tue, Mar 27, 2018 at 11:01 AM, Jagan Teki >> wrote: >> > Like axp221, axp223, axp813 the axp803 is also supporting external >> > regulator to drive the OTG VBus through N_VBUSEN PMIC pin. >> > >> > Add support for it. >> > >> > Signed-off-by: Jagan Teki >> > Reviewed-by: Rob Herring >> > Reviewed-by: Chen-Yu Tsai >> > --- >> > Changes for v5: >> > - Collect Chen-Yu reviewed-by tag >> > Changes for v4: >> > - rebase on master >> > Changes for v3: >> > - Update drivevbus in table of regulators >> >> Can you pick these, has some dependency with drivevbus on other >> patches. > > I'm not the regulator maintainer, nor the AXP maintainer for that > matter. Mark Brown and Chen-Yu are, respectively. I've already reviewed all the patches. Please resend the series and include Mark Brown, the regulator subsystem maintainer. He clearly isn't in the current recipient list, so no wonder things didn't move forward. Once he applies the driver bits, we'll apply any pending device tree changes. ChenYu
Re: [PATCH] mtd: nand: mtk: use of_device_get_match_data()
On Mon, 2018-04-16 at 10:33 +0800, Ryder Lee wrote: > The usage of of_device_get_match_data() reduce the code size a bit. > > Also, the only way to call .probe() is to match an entry in > .of_match_table[], so of_device_id cannot be NULL. > > Signed-off-by: Ryder Lee> --- > drivers/mtd/nand/raw/mtk_ecc.c | 7 +-- > drivers/mtd/nand/raw/mtk_nand.c | 10 +- > 2 files changed, 2 insertions(+), 15 deletions(-) > > diff --git a/drivers/mtd/nand/mtk_ecc.c b/drivers/mtd/nand/mtk_ecc.c > index 40d86a8..6432bd7 100644 > --- a/drivers/mtd/nand/raw/mtk_ecc.c > +++ b/drivers/mtd/nand/raw/mtk_ecc.c > @@ -500,7 +500,6 @@ static int mtk_ecc_probe(struct platform_device *pdev) > struct device *dev = >dev; > struct mtk_ecc *ecc; > struct resource *res; > - const struct of_device_id *of_ecc_id = NULL; > u32 max_eccdata_size; > int irq, ret; > > @@ -508,11 +507,7 @@ static int mtk_ecc_probe(struct platform_device *pdev) > if (!ecc) > return -ENOMEM; > > - of_ecc_id = of_match_device(mtk_ecc_dt_match, >dev); > - if (!of_ecc_id) > - return -ENODEV; > - > - ecc->caps = of_ecc_id->data; > + ecc->caps = of_device_get_match_data(dev); > Thanks. Reviewed-by: Xiaolei Li > max_eccdata_size = ecc->caps->num_ecc_strength - 1; > max_eccdata_size = ecc->caps->ecc_strength[max_eccdata_size]; > diff --git a/drivers/mtd/nand/mtk_nand.c b/drivers/mtd/nand/mtk_nand.c > index 6977da3..75c845a 100644 > --- a/drivers/mtd/nand/raw/mtk_nand.c > +++ b/drivers/mtd/nand/raw/mtk_nand.c > @@ -1434,7 +1434,6 @@ static int mtk_nfc_probe(struct platform_device *pdev) > struct device_node *np = dev->of_node; > struct mtk_nfc *nfc; > struct resource *res; > - const struct of_device_id *of_nfc_id = NULL; > int ret, irq; > > nfc = devm_kzalloc(dev, sizeof(*nfc), GFP_KERNEL); > @@ -1452,6 +1451,7 @@ static int mtk_nfc_probe(struct platform_device *pdev) > else if (!nfc->ecc) > return -ENODEV; > > + nfc->caps = of_device_get_match_data(dev); > nfc->dev = dev; > > res = platform_get_resource(pdev, IORESOURCE_MEM, 0); > @@ -1498,14 +1498,6 @@ static int mtk_nfc_probe(struct platform_device *pdev) > goto clk_disable; > } > > - of_nfc_id = of_match_device(mtk_nfc_id_table, >dev); > - if (!of_nfc_id) { > - ret = -ENODEV; > - goto clk_disable; > - } > - > - nfc->caps = of_nfc_id->data; > - > platform_set_drvdata(pdev, nfc); > > ret = mtk_nfc_nand_chips_init(dev, nfc);
Re: [PATCH] mtd: nand: mtk: use of_device_get_match_data()
On Mon, 2018-04-16 at 10:33 +0800, Ryder Lee wrote: > The usage of of_device_get_match_data() reduce the code size a bit. > > Also, the only way to call .probe() is to match an entry in > .of_match_table[], so of_device_id cannot be NULL. > > Signed-off-by: Ryder Lee > --- > drivers/mtd/nand/raw/mtk_ecc.c | 7 +-- > drivers/mtd/nand/raw/mtk_nand.c | 10 +- > 2 files changed, 2 insertions(+), 15 deletions(-) > > diff --git a/drivers/mtd/nand/mtk_ecc.c b/drivers/mtd/nand/mtk_ecc.c > index 40d86a8..6432bd7 100644 > --- a/drivers/mtd/nand/raw/mtk_ecc.c > +++ b/drivers/mtd/nand/raw/mtk_ecc.c > @@ -500,7 +500,6 @@ static int mtk_ecc_probe(struct platform_device *pdev) > struct device *dev = >dev; > struct mtk_ecc *ecc; > struct resource *res; > - const struct of_device_id *of_ecc_id = NULL; > u32 max_eccdata_size; > int irq, ret; > > @@ -508,11 +507,7 @@ static int mtk_ecc_probe(struct platform_device *pdev) > if (!ecc) > return -ENOMEM; > > - of_ecc_id = of_match_device(mtk_ecc_dt_match, >dev); > - if (!of_ecc_id) > - return -ENODEV; > - > - ecc->caps = of_ecc_id->data; > + ecc->caps = of_device_get_match_data(dev); > Thanks. Reviewed-by: Xiaolei Li > max_eccdata_size = ecc->caps->num_ecc_strength - 1; > max_eccdata_size = ecc->caps->ecc_strength[max_eccdata_size]; > diff --git a/drivers/mtd/nand/mtk_nand.c b/drivers/mtd/nand/mtk_nand.c > index 6977da3..75c845a 100644 > --- a/drivers/mtd/nand/raw/mtk_nand.c > +++ b/drivers/mtd/nand/raw/mtk_nand.c > @@ -1434,7 +1434,6 @@ static int mtk_nfc_probe(struct platform_device *pdev) > struct device_node *np = dev->of_node; > struct mtk_nfc *nfc; > struct resource *res; > - const struct of_device_id *of_nfc_id = NULL; > int ret, irq; > > nfc = devm_kzalloc(dev, sizeof(*nfc), GFP_KERNEL); > @@ -1452,6 +1451,7 @@ static int mtk_nfc_probe(struct platform_device *pdev) > else if (!nfc->ecc) > return -ENODEV; > > + nfc->caps = of_device_get_match_data(dev); > nfc->dev = dev; > > res = platform_get_resource(pdev, IORESOURCE_MEM, 0); > @@ -1498,14 +1498,6 @@ static int mtk_nfc_probe(struct platform_device *pdev) > goto clk_disable; > } > > - of_nfc_id = of_match_device(mtk_nfc_id_table, >dev); > - if (!of_nfc_id) { > - ret = -ENODEV; > - goto clk_disable; > - } > - > - nfc->caps = of_nfc_id->data; > - > platform_set_drvdata(pdev, nfc); > > ret = mtk_nfc_nand_chips_init(dev, nfc);
Re: [PATCH] printk: Ratelimit messages printed by console drivers
On (04/16/18 10:47), Sergey Senozhatsky wrote: > On (04/14/18 11:35), Sergey Senozhatsky wrote: > > On (04/13/18 10:12), Steven Rostedt wrote: > > > > > > > The interval is set to one hour. It is rather arbitrary selected time. > > > > It is supposed to be a compromise between never print these messages, > > > > do not lockup the machine, do not fill the entire buffer too quickly, > > > > and get information if something changes over time. > > > > > > > > > I think an hour is incredibly long. We only allow 100 lines per hour for > > > printks happening inside another printk? > > > > > > I think 5 minutes (at most) would probably be plenty. One minute may be > > > good enough. > > > > Besides 100 lines is absolutely not enough for any real lockdep splat. > > My call would be - up to 1000 lines in a 1 minute interval. > > Well, if we want to basically turn printk_safe() into > printk_safe_ratelimited(). > I'm not so sure about it. > > Besides the patch also rate limits printk_nmi->logbuf - the logbuf > PRINTK_NMI_DEFERRED_CONTEXT_MASK bypass, which is way too important > to rate limit it - for no reason. > > Dunno, can we keep printk_safe() the way it is and introduce a new > printk_safe_ratelimited() specifically for call_console_drivers()? > > Lockdep splat is a one time event, if we lose half of it - we, most > like, lose the entire report. And call_console_drivers() is not the > one and only source of warnings/errors/etc. So if we turn printk_safe > into printk_safe_ratelimited() [not sure we want to do it] for all > then I want restrictions to be as low as possible, IOW to log_store() > as many lines as possible. One more thing, I'd really prefer to rate limit the function which flushes per-CPU printk_safe buffers; not the function that appends new messages to the per-CPU printk_safe buffers. There is a significant difference. printk_safe does not help us when we are dealing with any external locks - and call_console_drivers() is precisely that type of case. The very next thing to happen after lockdep splat, or spin_lock debugging report, etc. can be an actual deadlock->panic(). Thus I want to have the entire report in per-CPU buffer [if possible], so we can flush_on_panic() per-CPU buffers, or at least move the data to the logbuf and make it accessible in vmcore. If we rate limit the function that appends data to the per-CPU buffer then we may simply suppress [rate limit] the report, so there will be nothing to flush_on_panic(). -ss
Re: [PATCH v4 4/4] zram: introduce zram memory tracking
On Sun, Apr 15, 2018 at 09:17:45PM -0700, Randy Dunlap wrote: > On 04/15/2018 08:31 PM, Minchan Kim wrote: > > zRam as swap is useful for small memory device. However, swap means > > those pages on zram are mostly cold pages due to VM's LRU algorithm. > > Especially, once init data for application are touched for launching, > > they tend to be not accessed any more and finally swapped out. > > zRAM can store such cold pages as compressed form but it's pointless > > to keep in memory. Better idea is app developers free them directly > > rather than remaining them on heap. > > > > This patch tell us last access time of each block of zram via > > "cat /sys/kernel/debug/zram/zram0/block_state". > > > > The output is as follows, > > 30075.033841 .wh > > 30163.806904 s.. > > 30263.806919 ..h > > > > First column is zram's block index and 3rh one represents symbol > > (s: same page w: written page to backing store h: huge page) of the > > block state. Second column represents usec time unit of the block > > was last accessed. So above example means the 300th block is accessed > > at 75.033851 second and it was huge so it was written to the backing > > store. > > > > Admin can leverage this information to catch cold|incompressible pages > > of process with *pagemap* once part of heaps are swapped out. > > > > Acked-by: Greg Kroah-Hartman> > Signed-off-by: Minchan Kim > > --- > > Documentation/blockdev/zram.txt | 24 ++ > > drivers/block/zram/Kconfig | 10 +++ > > drivers/block/zram/zram_drv.c | 140 +--- > > drivers/block/zram/zram_drv.h | 5 ++ > > 4 files changed, 168 insertions(+), 11 deletions(-) > > > > diff --git a/Documentation/blockdev/zram.txt > > b/Documentation/blockdev/zram.txt > > index 78db38d02bc9..45509c7d5716 100644 > > --- a/Documentation/blockdev/zram.txt > > +++ b/Documentation/blockdev/zram.txt > > @@ -243,5 +243,29 @@ to backing storage rather than keeping it in memory. > > User should set up backing device via /sys/block/zramX/backing_dev > > before disksize setting. > > > > += memory tracking > > + > > +With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the > > +zram block. It could be useful to catch cold or incompressible > > +pages of the proess with*pagemap. > > ? process > > > +If you enable the feature, you could see block state via > > +/sys/kernel/debug/zram/zram0/block_state". The output is as follows, > > + > > + 30075.033841 .wh > > + 30163.806904 s.. > > + 30263.806919 ..h > > + > > +First column is zram's block index. > > +Second column is access time. > > +Third column is state of the block. > > +(s: same page > > +w: written page to backing store > > +h: huge page) > > + > > +First line of above example says 300th block is accessed at 75.033841sec > > +and the block's state is huge so it is written back to the backing > > +storage. It's a debugging feature so anyone shouldn't rely on it to work > > +properly. > > + > > Nitin Gupta > > ngu...@vflare.org > > diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig > > index ac3a31d433b2..01090338fb47 100644 > > --- a/drivers/block/zram/Kconfig > > +++ b/drivers/block/zram/Kconfig > > @@ -26,3 +26,13 @@ config ZRAM_WRITEBACK > > /sys/block/zramX/backing_dev. > > > > See zram.txt for more infomration. > > + > > +config ZRAM_MEMORY_TRACKING > > + bool "Tracking zram block status" > > bool "Track zram block status" > > although sometimes it is zRam or zRAM. > > > > + depends on ZRAM && DEBUG_FS > > + help > > + With this feature, admin can track the state of allocated block > > blocks > > > + of zRAM. Admin could see the information via > > + /sys/kernel/debug/zram/zramX/block_state. > > + > > + See zram.txt for more information. > > See Documentation/blockdev/zram.txt for more information. I just fix things. I will wait more feedback and then resend. Thanks for the review!
Re: [PATCH] printk: Ratelimit messages printed by console drivers
On (04/16/18 10:47), Sergey Senozhatsky wrote: > On (04/14/18 11:35), Sergey Senozhatsky wrote: > > On (04/13/18 10:12), Steven Rostedt wrote: > > > > > > > The interval is set to one hour. It is rather arbitrary selected time. > > > > It is supposed to be a compromise between never print these messages, > > > > do not lockup the machine, do not fill the entire buffer too quickly, > > > > and get information if something changes over time. > > > > > > > > > I think an hour is incredibly long. We only allow 100 lines per hour for > > > printks happening inside another printk? > > > > > > I think 5 minutes (at most) would probably be plenty. One minute may be > > > good enough. > > > > Besides 100 lines is absolutely not enough for any real lockdep splat. > > My call would be - up to 1000 lines in a 1 minute interval. > > Well, if we want to basically turn printk_safe() into > printk_safe_ratelimited(). > I'm not so sure about it. > > Besides the patch also rate limits printk_nmi->logbuf - the logbuf > PRINTK_NMI_DEFERRED_CONTEXT_MASK bypass, which is way too important > to rate limit it - for no reason. > > Dunno, can we keep printk_safe() the way it is and introduce a new > printk_safe_ratelimited() specifically for call_console_drivers()? > > Lockdep splat is a one time event, if we lose half of it - we, most > like, lose the entire report. And call_console_drivers() is not the > one and only source of warnings/errors/etc. So if we turn printk_safe > into printk_safe_ratelimited() [not sure we want to do it] for all > then I want restrictions to be as low as possible, IOW to log_store() > as many lines as possible. One more thing, I'd really prefer to rate limit the function which flushes per-CPU printk_safe buffers; not the function that appends new messages to the per-CPU printk_safe buffers. There is a significant difference. printk_safe does not help us when we are dealing with any external locks - and call_console_drivers() is precisely that type of case. The very next thing to happen after lockdep splat, or spin_lock debugging report, etc. can be an actual deadlock->panic(). Thus I want to have the entire report in per-CPU buffer [if possible], so we can flush_on_panic() per-CPU buffers, or at least move the data to the logbuf and make it accessible in vmcore. If we rate limit the function that appends data to the per-CPU buffer then we may simply suppress [rate limit] the report, so there will be nothing to flush_on_panic(). -ss
Re: [PATCH v4 4/4] zram: introduce zram memory tracking
On Sun, Apr 15, 2018 at 09:17:45PM -0700, Randy Dunlap wrote: > On 04/15/2018 08:31 PM, Minchan Kim wrote: > > zRam as swap is useful for small memory device. However, swap means > > those pages on zram are mostly cold pages due to VM's LRU algorithm. > > Especially, once init data for application are touched for launching, > > they tend to be not accessed any more and finally swapped out. > > zRAM can store such cold pages as compressed form but it's pointless > > to keep in memory. Better idea is app developers free them directly > > rather than remaining them on heap. > > > > This patch tell us last access time of each block of zram via > > "cat /sys/kernel/debug/zram/zram0/block_state". > > > > The output is as follows, > > 30075.033841 .wh > > 30163.806904 s.. > > 30263.806919 ..h > > > > First column is zram's block index and 3rh one represents symbol > > (s: same page w: written page to backing store h: huge page) of the > > block state. Second column represents usec time unit of the block > > was last accessed. So above example means the 300th block is accessed > > at 75.033851 second and it was huge so it was written to the backing > > store. > > > > Admin can leverage this information to catch cold|incompressible pages > > of process with *pagemap* once part of heaps are swapped out. > > > > Acked-by: Greg Kroah-Hartman > > Signed-off-by: Minchan Kim > > --- > > Documentation/blockdev/zram.txt | 24 ++ > > drivers/block/zram/Kconfig | 10 +++ > > drivers/block/zram/zram_drv.c | 140 +--- > > drivers/block/zram/zram_drv.h | 5 ++ > > 4 files changed, 168 insertions(+), 11 deletions(-) > > > > diff --git a/Documentation/blockdev/zram.txt > > b/Documentation/blockdev/zram.txt > > index 78db38d02bc9..45509c7d5716 100644 > > --- a/Documentation/blockdev/zram.txt > > +++ b/Documentation/blockdev/zram.txt > > @@ -243,5 +243,29 @@ to backing storage rather than keeping it in memory. > > User should set up backing device via /sys/block/zramX/backing_dev > > before disksize setting. > > > > += memory tracking > > + > > +With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the > > +zram block. It could be useful to catch cold or incompressible > > +pages of the proess with*pagemap. > > ? process > > > +If you enable the feature, you could see block state via > > +/sys/kernel/debug/zram/zram0/block_state". The output is as follows, > > + > > + 30075.033841 .wh > > + 30163.806904 s.. > > + 30263.806919 ..h > > + > > +First column is zram's block index. > > +Second column is access time. > > +Third column is state of the block. > > +(s: same page > > +w: written page to backing store > > +h: huge page) > > + > > +First line of above example says 300th block is accessed at 75.033841sec > > +and the block's state is huge so it is written back to the backing > > +storage. It's a debugging feature so anyone shouldn't rely on it to work > > +properly. > > + > > Nitin Gupta > > ngu...@vflare.org > > diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig > > index ac3a31d433b2..01090338fb47 100644 > > --- a/drivers/block/zram/Kconfig > > +++ b/drivers/block/zram/Kconfig > > @@ -26,3 +26,13 @@ config ZRAM_WRITEBACK > > /sys/block/zramX/backing_dev. > > > > See zram.txt for more infomration. > > + > > +config ZRAM_MEMORY_TRACKING > > + bool "Tracking zram block status" > > bool "Track zram block status" > > although sometimes it is zRam or zRAM. > > > > + depends on ZRAM && DEBUG_FS > > + help > > + With this feature, admin can track the state of allocated block > > blocks > > > + of zRAM. Admin could see the information via > > + /sys/kernel/debug/zram/zramX/block_state. > > + > > + See zram.txt for more information. > > See Documentation/blockdev/zram.txt for more information. I just fix things. I will wait more feedback and then resend. Thanks for the review!
[PATCH] perf tools: set kernel end address properly
The map_groups__fixup_end() was called to set end addresses of kernel map and module maps. But now machine__create_modules() is set the end address of modules properly so the only remaining piece is the kernel map. We can set it with adjacent module's address directly instead of calling the map_groups__fixup_end(). If there's no module after the kernel map, the end address will be ~0ULL. Reported-by: Kim PhillipsSigned-off-by: Namhyung Kim --- tools/perf/util/machine.c | 20 1 file changed, 8 insertions(+), 12 deletions(-) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 2eca8478e24f..be328416de61 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -1019,13 +1019,6 @@ int machine__load_vmlinux_path(struct machine *machine, enum map_type type) return ret; } -static void map_groups__fixup_end(struct map_groups *mg) -{ - int i; - for (i = 0; i < MAP__NR_TYPES; ++i) - __map_groups__fixup_end(mg, i); -} - static char *get_kernel_version(const char *root_dir) { char version[PATH_MAX]; @@ -1233,7 +1226,9 @@ int machine__create_kernel_maps(struct machine *machine) { struct dso *kernel = machine__get_kernel(machine); const char *name = NULL; + struct map *map; u64 addr = 0; + u64 end = ~0ULL; int ret; if (kernel == NULL) @@ -1259,13 +1254,14 @@ int machine__create_kernel_maps(struct machine *machine) machine__destroy_kernel_maps(machine); return -1; } - machine__set_kernel_mmap(machine, addr, 0); } - /* -* Now that we have all the maps created, just set the ->end of them: -*/ - map_groups__fixup_end(>kmaps); + /* update end address of the kernel map using adjacent module address */ + map = map__next(machine__kernel_map(machine)); + if (map) + end = map->start; + + machine__set_kernel_mmap(machine, addr, end); return 0; } -- 2.16.2
[PATCH] perf tools: set kernel end address properly
The map_groups__fixup_end() was called to set end addresses of kernel map and module maps. But now machine__create_modules() is set the end address of modules properly so the only remaining piece is the kernel map. We can set it with adjacent module's address directly instead of calling the map_groups__fixup_end(). If there's no module after the kernel map, the end address will be ~0ULL. Reported-by: Kim Phillips Signed-off-by: Namhyung Kim --- tools/perf/util/machine.c | 20 1 file changed, 8 insertions(+), 12 deletions(-) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 2eca8478e24f..be328416de61 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -1019,13 +1019,6 @@ int machine__load_vmlinux_path(struct machine *machine, enum map_type type) return ret; } -static void map_groups__fixup_end(struct map_groups *mg) -{ - int i; - for (i = 0; i < MAP__NR_TYPES; ++i) - __map_groups__fixup_end(mg, i); -} - static char *get_kernel_version(const char *root_dir) { char version[PATH_MAX]; @@ -1233,7 +1226,9 @@ int machine__create_kernel_maps(struct machine *machine) { struct dso *kernel = machine__get_kernel(machine); const char *name = NULL; + struct map *map; u64 addr = 0; + u64 end = ~0ULL; int ret; if (kernel == NULL) @@ -1259,13 +1254,14 @@ int machine__create_kernel_maps(struct machine *machine) machine__destroy_kernel_maps(machine); return -1; } - machine__set_kernel_mmap(machine, addr, 0); } - /* -* Now that we have all the maps created, just set the ->end of them: -*/ - map_groups__fixup_end(>kmaps); + /* update end address of the kernel map using adjacent module address */ + map = map__next(machine__kernel_map(machine)); + if (map) + end = map->start; + + machine__set_kernel_mmap(machine, addr, end); return 0; } -- 2.16.2
[PATCH 05/25] staging: lustre: libcfs: remove excess space
From: Amir ShehataThe function cfs_cpt_table_print() was adding two spaces to the string buffer. Just add it once. Signed-off-by: Amir Shehata Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734 Reviewed-on: http://review.whamcloud.com/18916 Reviewed-by: Olaf Weber Reviewed-by: Doug Oucharek Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c index d207ae5..b2a88ef 100644 --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c @@ -147,7 +147,7 @@ struct cfs_cpt_table * for (i = 0; i < cptab->ctb_nparts; i++) { if (len > 0) { - rc = snprintf(tmp, len, "%d\t: ", i); + rc = snprintf(tmp, len, "%d\t:", i); len -= rc; } -- 1.8.3.1
[PATCH 02/22] staging: lustre: obd: create it_has_reply_body()
From: Vitaly FertmanThe lookup_intent it_op fields in many cases will be compared to the settings of IT_OPEN | IT_UNLINK | IT_LOOKUP | IT_GETATTR. Create a simple inline function for this common case. Signed-off-by: Vitaly Fertman Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7433 Seagate-bug-id: MRP-3072 MRP-3137 Reviewed-on: http://review.whamcloud.com/17220 Reviewed-by: Andrew Perepechko Reviewed-by: Andriy Skulysh Tested-by: Elena V. Gryaznova Reviewed-by: John L. Hammond Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/include/obd.h | 10 ++ drivers/staging/lustre/lustre/mdc/mdc_locks.c | 2 +- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h index f1233ca..ea6056b 100644 --- a/drivers/staging/lustre/lustre/include/obd.h +++ b/drivers/staging/lustre/lustre/include/obd.h @@ -686,6 +686,16 @@ enum md_cli_flags { CLI_MIGRATE = BIT(4), }; +/** + * GETXATTR is not included as only a couple of fields in the reply body + * is filled, but not FID which is needed for common intent handling in + * mdc_finish_intent_lock() + */ +static inline bool it_has_reply_body(const struct lookup_intent *it) +{ + return it->it_op & (IT_OPEN | IT_UNLINK | IT_LOOKUP | IT_GETATTR); +} + struct md_op_data { struct lu_fid op_fid1; /* operation fid1 (usually parent) */ struct lu_fid op_fid2; /* operation fid2 (usually child) */ diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c index 695ef44..309ead1 100644 --- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c +++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c @@ -568,7 +568,7 @@ static int mdc_finish_enqueue(struct obd_export *exp, it->it_op, it->it_disposition, it->it_status); /* We know what to expect, so we do any byte flipping required here */ - if (it->it_op & (IT_OPEN | IT_UNLINK | IT_LOOKUP | IT_GETATTR)) { + if (it_has_reply_body(it)) { struct mdt_body *body; body = req_capsule_server_get(pill, _MDT_BODY); -- 1.8.3.1
[PATCH 05/25] staging: lustre: libcfs: remove excess space
From: Amir Shehata The function cfs_cpt_table_print() was adding two spaces to the string buffer. Just add it once. Signed-off-by: Amir Shehata Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734 Reviewed-on: http://review.whamcloud.com/18916 Reviewed-by: Olaf Weber Reviewed-by: Doug Oucharek Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c index d207ae5..b2a88ef 100644 --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c @@ -147,7 +147,7 @@ struct cfs_cpt_table * for (i = 0; i < cptab->ctb_nparts; i++) { if (len > 0) { - rc = snprintf(tmp, len, "%d\t: ", i); + rc = snprintf(tmp, len, "%d\t:", i); len -= rc; } -- 1.8.3.1
[PATCH 02/22] staging: lustre: obd: create it_has_reply_body()
From: Vitaly Fertman The lookup_intent it_op fields in many cases will be compared to the settings of IT_OPEN | IT_UNLINK | IT_LOOKUP | IT_GETATTR. Create a simple inline function for this common case. Signed-off-by: Vitaly Fertman Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7433 Seagate-bug-id: MRP-3072 MRP-3137 Reviewed-on: http://review.whamcloud.com/17220 Reviewed-by: Andrew Perepechko Reviewed-by: Andriy Skulysh Tested-by: Elena V. Gryaznova Reviewed-by: John L. Hammond Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/include/obd.h | 10 ++ drivers/staging/lustre/lustre/mdc/mdc_locks.c | 2 +- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h index f1233ca..ea6056b 100644 --- a/drivers/staging/lustre/lustre/include/obd.h +++ b/drivers/staging/lustre/lustre/include/obd.h @@ -686,6 +686,16 @@ enum md_cli_flags { CLI_MIGRATE = BIT(4), }; +/** + * GETXATTR is not included as only a couple of fields in the reply body + * is filled, but not FID which is needed for common intent handling in + * mdc_finish_intent_lock() + */ +static inline bool it_has_reply_body(const struct lookup_intent *it) +{ + return it->it_op & (IT_OPEN | IT_UNLINK | IT_LOOKUP | IT_GETATTR); +} + struct md_op_data { struct lu_fid op_fid1; /* operation fid1 (usually parent) */ struct lu_fid op_fid2; /* operation fid2 (usually child) */ diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c index 695ef44..309ead1 100644 --- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c +++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c @@ -568,7 +568,7 @@ static int mdc_finish_enqueue(struct obd_export *exp, it->it_op, it->it_disposition, it->it_status); /* We know what to expect, so we do any byte flipping required here */ - if (it->it_op & (IT_OPEN | IT_UNLINK | IT_LOOKUP | IT_GETATTR)) { + if (it_has_reply_body(it)) { struct mdt_body *body; body = req_capsule_server_get(pill, _MDT_BODY); -- 1.8.3.1
[PATCH 04/22] staging: lustre: ldlm: xattr locks are lost on mdt
From: Vitaly FertmanOn the server side mdt_intent_getxattr() can return EFAULT if a buffer cannot be found, it is returned after lock_replace, where a new lock is installed into lockp. An error forces ldlm_lock_enqueue() to destroy the original lock, but ldlm_handle_enqueue0() drops the reference on the new lock. The xattr client code implied intent error is returned under a lock, which is immediately cancelled. Check if a lock obtained and cancel it properly for error cases. Note: we should support both cases for interop needs, an intent error under a lock and with a lock abort. Keep returning a lock with an intent error for interop purposes for now, to be dropped later when client will get old enough. make all intent ops to work through md_intent_lock: getxattr and layout, which should extract the intent error. Signed-off-by: Vitaly Fertman Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7433 Seagate-bug-id: MRP-3072 MRP-3137 Reviewed-on: http://review.whamcloud.com/17220 Reviewed-by: Andrew Perepechko Reviewed-by: Andriy Skulysh Tested-by: Elena V. Gryaznova Reviewed-by: John L. Hammond Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/include/obd.h | 3 +- drivers/staging/lustre/lustre/include/obd_class.h | 3 +- drivers/staging/lustre/lustre/llite/file.c| 16 ++--- drivers/staging/lustre/lustre/llite/xattr_cache.c | 75 --- drivers/staging/lustre/lustre/lmv/lmv_intent.c| 12 ++-- drivers/staging/lustre/lustre/lmv/lmv_obd.c | 7 +-- drivers/staging/lustre/lustre/mdc/mdc_internal.h | 4 +- drivers/staging/lustre/lustre/mdc/mdc_locks.c | 66 ++-- 8 files changed, 95 insertions(+), 91 deletions(-) diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h index ea6056b..48cf7ab 100644 --- a/drivers/staging/lustre/lustre/include/obd.h +++ b/drivers/staging/lustre/lustre/include/obd.h @@ -909,8 +909,7 @@ struct md_ops { const void *, size_t, umode_t, uid_t, gid_t, cfs_cap_t, __u64, struct ptlrpc_request **); int (*enqueue)(struct obd_export *, struct ldlm_enqueue_info *, - const union ldlm_policy_data *, - struct lookup_intent *, struct md_op_data *, + const union ldlm_policy_data *, struct md_op_data *, struct lustre_handle *, __u64); int (*getattr)(struct obd_export *, struct md_op_data *, struct ptlrpc_request **); diff --git a/drivers/staging/lustre/lustre/include/obd_class.h b/drivers/staging/lustre/lustre/include/obd_class.h index 176b63e..a76f016 100644 --- a/drivers/staging/lustre/lustre/include/obd_class.h +++ b/drivers/staging/lustre/lustre/include/obd_class.h @@ -1241,7 +1241,6 @@ static inline int md_create(struct obd_export *exp, struct md_op_data *op_data, static inline int md_enqueue(struct obd_export *exp, struct ldlm_enqueue_info *einfo, const union ldlm_policy_data *policy, -struct lookup_intent *it, struct md_op_data *op_data, struct lustre_handle *lockh, __u64 extra_lock_flags) @@ -1250,7 +1249,7 @@ static inline int md_enqueue(struct obd_export *exp, EXP_CHECK_MD_OP(exp, enqueue); EXP_MD_COUNTER_INCREMENT(exp, enqueue); - rc = MDP(exp->exp_obd, enqueue)(exp, einfo, policy, it, op_data, lockh, + rc = MDP(exp->exp_obd, enqueue)(exp, einfo, policy, op_data, lockh, extra_lock_flags); return rc; } diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c index ca5faea..0026fde 100644 --- a/drivers/staging/lustre/lustre/llite/file.c +++ b/drivers/staging/lustre/lustre/llite/file.c @@ -2514,7 +2514,7 @@ int ll_fsync(struct file *file, loff_t start, loff_t end, int datasync) PFID(ll_inode2fid(inode)), flock.l_flock.pid, flags, einfo.ei_mode, flock.l_flock.start, flock.l_flock.end); - rc = md_enqueue(sbi->ll_md_exp, , , NULL, op_data, , + rc = md_enqueue(sbi->ll_md_exp, , , op_data, , flags); /* Restore the file lock type if not TEST lock. */ @@ -2527,7 +2527,7 @@ int ll_fsync(struct file *file, loff_t start, loff_t end, int datasync) if (rc2 && file_lock->fl_type != F_UNLCK) { einfo.ei_mode = LCK_NL; - md_enqueue(sbi->ll_md_exp, , , NULL, op_data, + md_enqueue(sbi->ll_md_exp, , ,
[PATCH 01/22] staging: lustre: llite: initialize xattr->xe_namelen
When the allocation of xattr->xe_name was moved to kstrdup() setting xattr->xe_namelen was dropped. This field is used in several parts of the xattr cache code so it broke xattr handling. Initialize xattr->xe_namelen when allocating xattr->xe_name succeeds. Also change the debugging statement to really report the xattr name instead of its length which wasn't event being set. Fixes: b3dd8957c23a ("staging: lustre: lustre: llite: Use kstrdup" Signed-off-by: James Simmons--- drivers/staging/lustre/lustre/llite/xattr_cache.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr_cache.c b/drivers/staging/lustre/lustre/llite/xattr_cache.c index 4dc799d..ef66949 100644 --- a/drivers/staging/lustre/lustre/llite/xattr_cache.c +++ b/drivers/staging/lustre/lustre/llite/xattr_cache.c @@ -121,10 +121,12 @@ static int ll_xattr_cache_add(struct list_head *cache, xattr->xe_name = kstrdup(xattr_name, GFP_NOFS); if (!xattr->xe_name) { - CDEBUG(D_CACHE, "failed to alloc xattr name %u\n", - xattr->xe_namelen); + CDEBUG(D_CACHE, "failed to alloc xattr name %s\n", + xattr_name); goto err_name; } + xattr->xe_namelen = strlen(xattr_name) + 1; + xattr->xe_value = kmemdup(xattr_val, xattr_val_len, GFP_NOFS); if (!xattr->xe_value) goto err_value; -- 1.8.3.1
[PATCH 00/22] staging: lustre: llite: fix xattr handling
From: James SimmonsLustre utilities and user land APIs heavly depend on special xattr handling. Sadly much of the xattr handling for lustre client has been broken for awhile. This is all the fixes needed to make xattr handling work properly with the latest kernels. Bobi Jam (3): staging: lustre: llite: break up ll_setstripe_ea function staging: lustre: llite: return from ll_adjust_lum() if lump is NULL staging: lustre: llite: eat -EEXIST on setting trusted.lov Dmitry Eremin (1): staging: lustre: llite: add support set_acl method in inode operations James Simmons (9): staging: lustre: llite: initialize xattr->xe_namelen staging: lustre: llite: fix invalid size test in ll_setstripe_ea() staging: lustre: llite: remove newline in fullname strings staging: lustre: llite: record in stats attempted removal of lma/link xattr staging: lustre: llite: cleanup posix acl xattr code staging: lustre: llite: use proper types in the xattr code staging: lustre: llite: cleanup xattr code comments staging: lustre: llite: style changes in xattr.c staging: lustre: llite: correct removexattr detection John L. Hammond (3): staging: lustre: llite: handle xattr cache refill race staging: lustre: llite: use xattr_handler name for ACLs staging: lustre: llite: remove unused parameters from md_{get,set}xattr() Niu Yawei (2): staging: lustre: llite: refactor lustre.lov xattr handling staging: lustre: llite: add simple comment about lustre.lov xattrs Robin Humble (1): staging: lustre: llite: Remove filtering of seclabel xattr Vitaly Fertman (3): staging: lustre: obd: create it_has_reply_body() staging: lustre: obd: change debug reporting in lmv_enqueue() staging: lustre: ldlm: xattr locks are lost on mdt drivers/staging/lustre/lustre/include/obd.h| 20 +- drivers/staging/lustre/lustre/include/obd_class.h | 24 +-- drivers/staging/lustre/lustre/llite/file.c | 86 ++-- .../staging/lustre/lustre/llite/llite_internal.h | 4 + drivers/staging/lustre/lustre/llite/namei.c| 10 +- drivers/staging/lustre/lustre/llite/xattr.c| 231 - drivers/staging/lustre/lustre/llite/xattr_cache.c | 83 +++- drivers/staging/lustre/lustre/lmv/lmv_intent.c | 12 +- drivers/staging/lustre/lustre/lmv/lmv_obd.c| 36 ++-- drivers/staging/lustre/lustre/mdc/mdc_internal.h | 4 +- drivers/staging/lustre/lustre/mdc/mdc_locks.c | 68 -- drivers/staging/lustre/lustre/mdc/mdc_request.c| 34 +-- 12 files changed, 364 insertions(+), 248 deletions(-) -- 1.8.3.1
[PATCH 04/22] staging: lustre: ldlm: xattr locks are lost on mdt
From: Vitaly Fertman On the server side mdt_intent_getxattr() can return EFAULT if a buffer cannot be found, it is returned after lock_replace, where a new lock is installed into lockp. An error forces ldlm_lock_enqueue() to destroy the original lock, but ldlm_handle_enqueue0() drops the reference on the new lock. The xattr client code implied intent error is returned under a lock, which is immediately cancelled. Check if a lock obtained and cancel it properly for error cases. Note: we should support both cases for interop needs, an intent error under a lock and with a lock abort. Keep returning a lock with an intent error for interop purposes for now, to be dropped later when client will get old enough. make all intent ops to work through md_intent_lock: getxattr and layout, which should extract the intent error. Signed-off-by: Vitaly Fertman Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7433 Seagate-bug-id: MRP-3072 MRP-3137 Reviewed-on: http://review.whamcloud.com/17220 Reviewed-by: Andrew Perepechko Reviewed-by: Andriy Skulysh Tested-by: Elena V. Gryaznova Reviewed-by: John L. Hammond Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/include/obd.h | 3 +- drivers/staging/lustre/lustre/include/obd_class.h | 3 +- drivers/staging/lustre/lustre/llite/file.c| 16 ++--- drivers/staging/lustre/lustre/llite/xattr_cache.c | 75 --- drivers/staging/lustre/lustre/lmv/lmv_intent.c| 12 ++-- drivers/staging/lustre/lustre/lmv/lmv_obd.c | 7 +-- drivers/staging/lustre/lustre/mdc/mdc_internal.h | 4 +- drivers/staging/lustre/lustre/mdc/mdc_locks.c | 66 ++-- 8 files changed, 95 insertions(+), 91 deletions(-) diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h index ea6056b..48cf7ab 100644 --- a/drivers/staging/lustre/lustre/include/obd.h +++ b/drivers/staging/lustre/lustre/include/obd.h @@ -909,8 +909,7 @@ struct md_ops { const void *, size_t, umode_t, uid_t, gid_t, cfs_cap_t, __u64, struct ptlrpc_request **); int (*enqueue)(struct obd_export *, struct ldlm_enqueue_info *, - const union ldlm_policy_data *, - struct lookup_intent *, struct md_op_data *, + const union ldlm_policy_data *, struct md_op_data *, struct lustre_handle *, __u64); int (*getattr)(struct obd_export *, struct md_op_data *, struct ptlrpc_request **); diff --git a/drivers/staging/lustre/lustre/include/obd_class.h b/drivers/staging/lustre/lustre/include/obd_class.h index 176b63e..a76f016 100644 --- a/drivers/staging/lustre/lustre/include/obd_class.h +++ b/drivers/staging/lustre/lustre/include/obd_class.h @@ -1241,7 +1241,6 @@ static inline int md_create(struct obd_export *exp, struct md_op_data *op_data, static inline int md_enqueue(struct obd_export *exp, struct ldlm_enqueue_info *einfo, const union ldlm_policy_data *policy, -struct lookup_intent *it, struct md_op_data *op_data, struct lustre_handle *lockh, __u64 extra_lock_flags) @@ -1250,7 +1249,7 @@ static inline int md_enqueue(struct obd_export *exp, EXP_CHECK_MD_OP(exp, enqueue); EXP_MD_COUNTER_INCREMENT(exp, enqueue); - rc = MDP(exp->exp_obd, enqueue)(exp, einfo, policy, it, op_data, lockh, + rc = MDP(exp->exp_obd, enqueue)(exp, einfo, policy, op_data, lockh, extra_lock_flags); return rc; } diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c index ca5faea..0026fde 100644 --- a/drivers/staging/lustre/lustre/llite/file.c +++ b/drivers/staging/lustre/lustre/llite/file.c @@ -2514,7 +2514,7 @@ int ll_fsync(struct file *file, loff_t start, loff_t end, int datasync) PFID(ll_inode2fid(inode)), flock.l_flock.pid, flags, einfo.ei_mode, flock.l_flock.start, flock.l_flock.end); - rc = md_enqueue(sbi->ll_md_exp, , , NULL, op_data, , + rc = md_enqueue(sbi->ll_md_exp, , , op_data, , flags); /* Restore the file lock type if not TEST lock. */ @@ -2527,7 +2527,7 @@ int ll_fsync(struct file *file, loff_t start, loff_t end, int datasync) if (rc2 && file_lock->fl_type != F_UNLCK) { einfo.ei_mode = LCK_NL; - md_enqueue(sbi->ll_md_exp, , , NULL, op_data, + md_enqueue(sbi->ll_md_exp, , , op_data, , flags); rc = rc2; } @@ -3474,12 +3474,7 @@ static int ll_layout_refresh_locked(struct inode *inode) struct lookup_intent it; struct lustre_handle lockh;
[PATCH 01/22] staging: lustre: llite: initialize xattr->xe_namelen
When the allocation of xattr->xe_name was moved to kstrdup() setting xattr->xe_namelen was dropped. This field is used in several parts of the xattr cache code so it broke xattr handling. Initialize xattr->xe_namelen when allocating xattr->xe_name succeeds. Also change the debugging statement to really report the xattr name instead of its length which wasn't event being set. Fixes: b3dd8957c23a ("staging: lustre: lustre: llite: Use kstrdup" Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr_cache.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr_cache.c b/drivers/staging/lustre/lustre/llite/xattr_cache.c index 4dc799d..ef66949 100644 --- a/drivers/staging/lustre/lustre/llite/xattr_cache.c +++ b/drivers/staging/lustre/lustre/llite/xattr_cache.c @@ -121,10 +121,12 @@ static int ll_xattr_cache_add(struct list_head *cache, xattr->xe_name = kstrdup(xattr_name, GFP_NOFS); if (!xattr->xe_name) { - CDEBUG(D_CACHE, "failed to alloc xattr name %u\n", - xattr->xe_namelen); + CDEBUG(D_CACHE, "failed to alloc xattr name %s\n", + xattr_name); goto err_name; } + xattr->xe_namelen = strlen(xattr_name) + 1; + xattr->xe_value = kmemdup(xattr_val, xattr_val_len, GFP_NOFS); if (!xattr->xe_value) goto err_value; -- 1.8.3.1
[PATCH 00/22] staging: lustre: llite: fix xattr handling
From: James Simmons Lustre utilities and user land APIs heavly depend on special xattr handling. Sadly much of the xattr handling for lustre client has been broken for awhile. This is all the fixes needed to make xattr handling work properly with the latest kernels. Bobi Jam (3): staging: lustre: llite: break up ll_setstripe_ea function staging: lustre: llite: return from ll_adjust_lum() if lump is NULL staging: lustre: llite: eat -EEXIST on setting trusted.lov Dmitry Eremin (1): staging: lustre: llite: add support set_acl method in inode operations James Simmons (9): staging: lustre: llite: initialize xattr->xe_namelen staging: lustre: llite: fix invalid size test in ll_setstripe_ea() staging: lustre: llite: remove newline in fullname strings staging: lustre: llite: record in stats attempted removal of lma/link xattr staging: lustre: llite: cleanup posix acl xattr code staging: lustre: llite: use proper types in the xattr code staging: lustre: llite: cleanup xattr code comments staging: lustre: llite: style changes in xattr.c staging: lustre: llite: correct removexattr detection John L. Hammond (3): staging: lustre: llite: handle xattr cache refill race staging: lustre: llite: use xattr_handler name for ACLs staging: lustre: llite: remove unused parameters from md_{get,set}xattr() Niu Yawei (2): staging: lustre: llite: refactor lustre.lov xattr handling staging: lustre: llite: add simple comment about lustre.lov xattrs Robin Humble (1): staging: lustre: llite: Remove filtering of seclabel xattr Vitaly Fertman (3): staging: lustre: obd: create it_has_reply_body() staging: lustre: obd: change debug reporting in lmv_enqueue() staging: lustre: ldlm: xattr locks are lost on mdt drivers/staging/lustre/lustre/include/obd.h| 20 +- drivers/staging/lustre/lustre/include/obd_class.h | 24 +-- drivers/staging/lustre/lustre/llite/file.c | 86 ++-- .../staging/lustre/lustre/llite/llite_internal.h | 4 + drivers/staging/lustre/lustre/llite/namei.c| 10 +- drivers/staging/lustre/lustre/llite/xattr.c| 231 - drivers/staging/lustre/lustre/llite/xattr_cache.c | 83 +++- drivers/staging/lustre/lustre/lmv/lmv_intent.c | 12 +- drivers/staging/lustre/lustre/lmv/lmv_obd.c| 36 ++-- drivers/staging/lustre/lustre/mdc/mdc_internal.h | 4 +- drivers/staging/lustre/lustre/mdc/mdc_locks.c | 68 -- drivers/staging/lustre/lustre/mdc/mdc_request.c| 34 +-- 12 files changed, 364 insertions(+), 248 deletions(-) -- 1.8.3.1
[PATCH 07/22] staging: lustre: llite: refactor lustre.lov xattr handling
From: Niu YaweiThe function ll_xattr_set() contains special code to handle the lustre specific xattr lustre.lov. Move all this code to a new function ll_setstripe_ea(). Signed-off-by: Bobi Jam Signed-off-by: Niu Yawei Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8998 Reviewed-on: https://review.whamcloud.com/24851 Reviewed-by: Andreas Dilger Reviewed-by: Lai Siyao Reviewed-by: Jinshan Xiong Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 131 +++- 1 file changed, 69 insertions(+), 62 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index 55a19a5..1b462e4 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -186,6 +186,73 @@ static int get_hsm_state(struct inode *inode, u32 *hus_states) return rc; } +static int ll_setstripe_ea(struct dentry *dentry, struct lov_user_md *lump, + size_t size) +{ + struct inode *inode = d_inode(dentry); + int rc = 0; + + if (size != 0 && size < sizeof(struct lov_user_md)) + return -EINVAL; + + /* +* It is possible to set an xattr to a "" value of zero size. +* For this case we are going to treat it as a removal. +*/ + if (!size && lump) + lump = NULL; + + /* Attributes that are saved via getxattr will always have +* the stripe_offset as 0. Instead, the MDS should be +* allowed to pick the starting OST index. b=17846 +*/ + if (lump && lump->lmm_stripe_offset == 0) + lump->lmm_stripe_offset = -1; + + /* Avoid anyone directly setting the RELEASED flag. */ + if (lump && (lump->lmm_pattern & LOV_PATTERN_F_RELEASED)) { + /* Only if we have a released flag check if the file +* was indeed archived. +*/ + u32 state = HS_NONE; + + rc = get_hsm_state(inode, ); + if (rc) + return rc; + + if (!(state & HS_ARCHIVED)) { + CDEBUG(D_VFSTRACE, + "hus_states state = %x, pattern = %x\n", + state, lump->lmm_pattern); + /* +* Here the state is: real file is not +* archived but user is requesting to set +* the RELEASED flag so we mask off the +* released flag from the request +*/ + lump->lmm_pattern ^= LOV_PATTERN_F_RELEASED; + } + } + + if (lump && S_ISREG(inode->i_mode)) { + __u64 it_flags = FMODE_WRITE; + int lum_size; + + lum_size = ll_lov_user_md_size(lump); + if (lum_size < 0 || size < lum_size) + return 0; /* b=10667: ignore error */ + + rc = ll_lov_setstripe_ea_info(inode, dentry, it_flags, lump, + lum_size); + /* b=10667: rc always be 0 here for now */ + rc = 0; + } else if (S_ISDIR(inode->i_mode)) { + rc = ll_dir_setstripe(inode, lump, 0); + } + + return rc; +} + static int ll_xattr_set(const struct xattr_handler *handler, struct dentry *dentry, struct inode *inode, const char *name, const void *value, size_t size, @@ -198,73 +265,13 @@ static int ll_xattr_set(const struct xattr_handler *handler, PFID(ll_inode2fid(inode)), inode, name); if (!strcmp(name, "lov")) { - struct lov_user_md *lump = (struct lov_user_md *)value; int op_type = flags == XATTR_REPLACE ? LPROC_LL_REMOVEXATTR : LPROC_LL_SETXATTR; - int rc = 0; ll_stats_ops_tally(ll_i2sbi(inode), op_type, 1); - if (size != 0 && size < sizeof(struct lov_user_md)) - return -EINVAL; - - /* -* It is possible to set an xattr to a "" value of zero size. -* For this case we are going to treat it as a removal. -*/ - if (!size && lump) - lump = NULL; - - /* Attributes that are saved via getxattr will always have -* the stripe_offset as 0. Instead, the MDS should be -* allowed to pick the starting OST index. b=17846 -*/ - if (lump && lump->lmm_stripe_offset == 0) - lump->lmm_stripe_offset = -1; - -
[PATCH 05/22] staging: lustre: llite: handle xattr cache refill race
From: "John L. Hammond"In ll_xattr_cache_refill() if the xattr cache was invalid (and no request was sent) then return -EAGAIN so that ll_getxattr_common() caller will fetch the xattr from the MDT. Signed-off-by: John L. Hammond Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10132 Reviewed-on: https://review.whamcloud.com/29654 Reviewed-by: Andreas Dilger Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr_cache.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr_cache.c b/drivers/staging/lustre/lustre/llite/xattr_cache.c index 53dfaea..5da69ba0 100644 --- a/drivers/staging/lustre/lustre/llite/xattr_cache.c +++ b/drivers/staging/lustre/lustre/llite/xattr_cache.c @@ -357,7 +357,7 @@ static int ll_xattr_cache_refill(struct inode *inode) if (unlikely(!req)) { CDEBUG(D_CACHE, "cancelled by a parallel getxattr\n"); ll_intent_drop_lock(); - rc = -EIO; + rc = -EAGAIN; goto err_unlock; } -- 1.8.3.1
[PATCH 07/22] staging: lustre: llite: refactor lustre.lov xattr handling
From: Niu Yawei The function ll_xattr_set() contains special code to handle the lustre specific xattr lustre.lov. Move all this code to a new function ll_setstripe_ea(). Signed-off-by: Bobi Jam Signed-off-by: Niu Yawei Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8998 Reviewed-on: https://review.whamcloud.com/24851 Reviewed-by: Andreas Dilger Reviewed-by: Lai Siyao Reviewed-by: Jinshan Xiong Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 131 +++- 1 file changed, 69 insertions(+), 62 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index 55a19a5..1b462e4 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -186,6 +186,73 @@ static int get_hsm_state(struct inode *inode, u32 *hus_states) return rc; } +static int ll_setstripe_ea(struct dentry *dentry, struct lov_user_md *lump, + size_t size) +{ + struct inode *inode = d_inode(dentry); + int rc = 0; + + if (size != 0 && size < sizeof(struct lov_user_md)) + return -EINVAL; + + /* +* It is possible to set an xattr to a "" value of zero size. +* For this case we are going to treat it as a removal. +*/ + if (!size && lump) + lump = NULL; + + /* Attributes that are saved via getxattr will always have +* the stripe_offset as 0. Instead, the MDS should be +* allowed to pick the starting OST index. b=17846 +*/ + if (lump && lump->lmm_stripe_offset == 0) + lump->lmm_stripe_offset = -1; + + /* Avoid anyone directly setting the RELEASED flag. */ + if (lump && (lump->lmm_pattern & LOV_PATTERN_F_RELEASED)) { + /* Only if we have a released flag check if the file +* was indeed archived. +*/ + u32 state = HS_NONE; + + rc = get_hsm_state(inode, ); + if (rc) + return rc; + + if (!(state & HS_ARCHIVED)) { + CDEBUG(D_VFSTRACE, + "hus_states state = %x, pattern = %x\n", + state, lump->lmm_pattern); + /* +* Here the state is: real file is not +* archived but user is requesting to set +* the RELEASED flag so we mask off the +* released flag from the request +*/ + lump->lmm_pattern ^= LOV_PATTERN_F_RELEASED; + } + } + + if (lump && S_ISREG(inode->i_mode)) { + __u64 it_flags = FMODE_WRITE; + int lum_size; + + lum_size = ll_lov_user_md_size(lump); + if (lum_size < 0 || size < lum_size) + return 0; /* b=10667: ignore error */ + + rc = ll_lov_setstripe_ea_info(inode, dentry, it_flags, lump, + lum_size); + /* b=10667: rc always be 0 here for now */ + rc = 0; + } else if (S_ISDIR(inode->i_mode)) { + rc = ll_dir_setstripe(inode, lump, 0); + } + + return rc; +} + static int ll_xattr_set(const struct xattr_handler *handler, struct dentry *dentry, struct inode *inode, const char *name, const void *value, size_t size, @@ -198,73 +265,13 @@ static int ll_xattr_set(const struct xattr_handler *handler, PFID(ll_inode2fid(inode)), inode, name); if (!strcmp(name, "lov")) { - struct lov_user_md *lump = (struct lov_user_md *)value; int op_type = flags == XATTR_REPLACE ? LPROC_LL_REMOVEXATTR : LPROC_LL_SETXATTR; - int rc = 0; ll_stats_ops_tally(ll_i2sbi(inode), op_type, 1); - if (size != 0 && size < sizeof(struct lov_user_md)) - return -EINVAL; - - /* -* It is possible to set an xattr to a "" value of zero size. -* For this case we are going to treat it as a removal. -*/ - if (!size && lump) - lump = NULL; - - /* Attributes that are saved via getxattr will always have -* the stripe_offset as 0. Instead, the MDS should be -* allowed to pick the starting OST index. b=17846 -*/ - if (lump && lump->lmm_stripe_offset == 0) - lump->lmm_stripe_offset = -1; - - /* Avoid anyone directly setting the RELEASED flag. */ - if (lump && (lump->lmm_pattern & LOV_PATTERN_F_RELEASED)) { - /*
[PATCH 05/22] staging: lustre: llite: handle xattr cache refill race
From: "John L. Hammond" In ll_xattr_cache_refill() if the xattr cache was invalid (and no request was sent) then return -EAGAIN so that ll_getxattr_common() caller will fetch the xattr from the MDT. Signed-off-by: John L. Hammond Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10132 Reviewed-on: https://review.whamcloud.com/29654 Reviewed-by: Andreas Dilger Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr_cache.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr_cache.c b/drivers/staging/lustre/lustre/llite/xattr_cache.c index 53dfaea..5da69ba0 100644 --- a/drivers/staging/lustre/lustre/llite/xattr_cache.c +++ b/drivers/staging/lustre/lustre/llite/xattr_cache.c @@ -357,7 +357,7 @@ static int ll_xattr_cache_refill(struct inode *inode) if (unlikely(!req)) { CDEBUG(D_CACHE, "cancelled by a parallel getxattr\n"); ll_intent_drop_lock(); - rc = -EIO; + rc = -EAGAIN; goto err_unlock; } -- 1.8.3.1
[PATCH 10/22] staging: lustre: llite: return from ll_adjust_lum() if lump is NULL
From: Bobi JamNo need to check several times if lump is NULL. Just test once and return 0 if NULL. Signed-off-by: Bobi Jam Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9484 Reviewed-on: https://review.whamcloud.com/27126 Reviewed-by: Dmitry Eremin Reviewed-by: Niu Yawei Reviewed-by: James Simmons Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index 78ce85b..56ac07e 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -190,15 +190,18 @@ static int ll_adjust_lum(struct inode *inode, struct lov_user_md *lump) { int rc = 0; + if (!lump) + return 0; + /* Attributes that are saved via getxattr will always have * the stripe_offset as 0. Instead, the MDS should be * allowed to pick the starting OST index. b=17846 */ - if (lump && lump->lmm_stripe_offset == 0) + if (lump->lmm_stripe_offset == 0) lump->lmm_stripe_offset = -1; /* Avoid anyone directly setting the RELEASED flag. */ - if (lump && (lump->lmm_pattern & LOV_PATTERN_F_RELEASED)) { + if (lump->lmm_pattern & LOV_PATTERN_F_RELEASED) { /* Only if we have a released flag check if the file * was indeed archived. */ -- 1.8.3.1
Re: [PATCH v4 4/4] zram: introduce zram memory tracking
On 04/15/2018 08:31 PM, Minchan Kim wrote: > zRam as swap is useful for small memory device. However, swap means > those pages on zram are mostly cold pages due to VM's LRU algorithm. > Especially, once init data for application are touched for launching, > they tend to be not accessed any more and finally swapped out. > zRAM can store such cold pages as compressed form but it's pointless > to keep in memory. Better idea is app developers free them directly > rather than remaining them on heap. > > This patch tell us last access time of each block of zram via > "cat /sys/kernel/debug/zram/zram0/block_state". > > The output is as follows, > 30075.033841 .wh > 30163.806904 s.. > 30263.806919 ..h > > First column is zram's block index and 3rh one represents symbol > (s: same page w: written page to backing store h: huge page) of the > block state. Second column represents usec time unit of the block > was last accessed. So above example means the 300th block is accessed > at 75.033851 second and it was huge so it was written to the backing > store. > > Admin can leverage this information to catch cold|incompressible pages > of process with *pagemap* once part of heaps are swapped out. > > Acked-by: Greg Kroah-Hartman> Signed-off-by: Minchan Kim > --- > Documentation/blockdev/zram.txt | 24 ++ > drivers/block/zram/Kconfig | 10 +++ > drivers/block/zram/zram_drv.c | 140 +--- > drivers/block/zram/zram_drv.h | 5 ++ > 4 files changed, 168 insertions(+), 11 deletions(-) > > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt > index 78db38d02bc9..45509c7d5716 100644 > --- a/Documentation/blockdev/zram.txt > +++ b/Documentation/blockdev/zram.txt > @@ -243,5 +243,29 @@ to backing storage rather than keeping it in memory. > User should set up backing device via /sys/block/zramX/backing_dev > before disksize setting. > > += memory tracking > + > +With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the > +zram block. It could be useful to catch cold or incompressible > +pages of the proess with*pagemap. ? process > +If you enable the feature, you could see block state via > +/sys/kernel/debug/zram/zram0/block_state". The output is as follows, > + > + 30075.033841 .wh > + 30163.806904 s.. > + 30263.806919 ..h > + > +First column is zram's block index. > +Second column is access time. > +Third column is state of the block. > +(s: same page > +w: written page to backing store > +h: huge page) > + > +First line of above example says 300th block is accessed at 75.033841sec > +and the block's state is huge so it is written back to the backing > +storage. It's a debugging feature so anyone shouldn't rely on it to work > +properly. > + > Nitin Gupta > ngu...@vflare.org > diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig > index ac3a31d433b2..01090338fb47 100644 > --- a/drivers/block/zram/Kconfig > +++ b/drivers/block/zram/Kconfig > @@ -26,3 +26,13 @@ config ZRAM_WRITEBACK >/sys/block/zramX/backing_dev. > >See zram.txt for more infomration. > + > +config ZRAM_MEMORY_TRACKING > + bool "Tracking zram block status" bool "Track zram block status" although sometimes it is zRam or zRAM. > + depends on ZRAM && DEBUG_FS > + help > + With this feature, admin can track the state of allocated block blocks > + of zRAM. Admin could see the information via > + /sys/kernel/debug/zram/zramX/block_state. > + > + See zram.txt for more information. See Documentation/blockdev/zram.txt for more information. -- ~Randy
[PATCH 03/22] staging: lustre: obd: change debug reporting in lmv_enqueue()
From: Vitaly FertmanRemove LL_IT2STR(it) from debug macros in lmv_enqueue(). The removal makes it possible to simplify the md_enqueue() functions. Signed-off-by: Vitaly Fertman Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7433 Seagate-bug-id: MRP-3072 MRP-3137 Reviewed-on: http://review.whamcloud.com/17220 Reviewed-by: Andrew Perepechko Reviewed-by: Andriy Skulysh Tested-by: Elena V. Gryaznova Reviewed-by: John L. Hammond Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/lmv/lmv_obd.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c index 7be9310..e1c93cd 100644 --- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c +++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c @@ -1660,15 +1660,14 @@ static int lmv_create(struct obd_export *exp, struct md_op_data *op_data, struct lmv_obd *lmv = >u.lmv; struct lmv_tgt_desc *tgt; - CDEBUG(D_INODE, "ENQUEUE '%s' on " DFID "\n", - LL_IT2STR(it), PFID(_data->op_fid1)); + CDEBUG(D_INODE, "ENQUEUE on " DFID "\n", PFID(_data->op_fid1)); tgt = lmv_locate_mds(lmv, op_data, _data->op_fid1); if (IS_ERR(tgt)) return PTR_ERR(tgt); - CDEBUG(D_INODE, "ENQUEUE '%s' on " DFID " -> mds #%u\n", - LL_IT2STR(it), PFID(_data->op_fid1), tgt->ltd_idx); + CDEBUG(D_INODE, "ENQUEUE on " DFID " -> mds #%u\n", + PFID(_data->op_fid1), tgt->ltd_idx); return md_enqueue(tgt->ltd_exp, einfo, policy, it, op_data, lockh, extra_lock_flags); -- 1.8.3.1
[PATCH 10/22] staging: lustre: llite: return from ll_adjust_lum() if lump is NULL
From: Bobi Jam No need to check several times if lump is NULL. Just test once and return 0 if NULL. Signed-off-by: Bobi Jam Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9484 Reviewed-on: https://review.whamcloud.com/27126 Reviewed-by: Dmitry Eremin Reviewed-by: Niu Yawei Reviewed-by: James Simmons Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index 78ce85b..56ac07e 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -190,15 +190,18 @@ static int ll_adjust_lum(struct inode *inode, struct lov_user_md *lump) { int rc = 0; + if (!lump) + return 0; + /* Attributes that are saved via getxattr will always have * the stripe_offset as 0. Instead, the MDS should be * allowed to pick the starting OST index. b=17846 */ - if (lump && lump->lmm_stripe_offset == 0) + if (lump->lmm_stripe_offset == 0) lump->lmm_stripe_offset = -1; /* Avoid anyone directly setting the RELEASED flag. */ - if (lump && (lump->lmm_pattern & LOV_PATTERN_F_RELEASED)) { + if (lump->lmm_pattern & LOV_PATTERN_F_RELEASED) { /* Only if we have a released flag check if the file * was indeed archived. */ -- 1.8.3.1
Re: [PATCH v4 4/4] zram: introduce zram memory tracking
On 04/15/2018 08:31 PM, Minchan Kim wrote: > zRam as swap is useful for small memory device. However, swap means > those pages on zram are mostly cold pages due to VM's LRU algorithm. > Especially, once init data for application are touched for launching, > they tend to be not accessed any more and finally swapped out. > zRAM can store such cold pages as compressed form but it's pointless > to keep in memory. Better idea is app developers free them directly > rather than remaining them on heap. > > This patch tell us last access time of each block of zram via > "cat /sys/kernel/debug/zram/zram0/block_state". > > The output is as follows, > 30075.033841 .wh > 30163.806904 s.. > 30263.806919 ..h > > First column is zram's block index and 3rh one represents symbol > (s: same page w: written page to backing store h: huge page) of the > block state. Second column represents usec time unit of the block > was last accessed. So above example means the 300th block is accessed > at 75.033851 second and it was huge so it was written to the backing > store. > > Admin can leverage this information to catch cold|incompressible pages > of process with *pagemap* once part of heaps are swapped out. > > Acked-by: Greg Kroah-Hartman > Signed-off-by: Minchan Kim > --- > Documentation/blockdev/zram.txt | 24 ++ > drivers/block/zram/Kconfig | 10 +++ > drivers/block/zram/zram_drv.c | 140 +--- > drivers/block/zram/zram_drv.h | 5 ++ > 4 files changed, 168 insertions(+), 11 deletions(-) > > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt > index 78db38d02bc9..45509c7d5716 100644 > --- a/Documentation/blockdev/zram.txt > +++ b/Documentation/blockdev/zram.txt > @@ -243,5 +243,29 @@ to backing storage rather than keeping it in memory. > User should set up backing device via /sys/block/zramX/backing_dev > before disksize setting. > > += memory tracking > + > +With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the > +zram block. It could be useful to catch cold or incompressible > +pages of the proess with*pagemap. ? process > +If you enable the feature, you could see block state via > +/sys/kernel/debug/zram/zram0/block_state". The output is as follows, > + > + 30075.033841 .wh > + 30163.806904 s.. > + 30263.806919 ..h > + > +First column is zram's block index. > +Second column is access time. > +Third column is state of the block. > +(s: same page > +w: written page to backing store > +h: huge page) > + > +First line of above example says 300th block is accessed at 75.033841sec > +and the block's state is huge so it is written back to the backing > +storage. It's a debugging feature so anyone shouldn't rely on it to work > +properly. > + > Nitin Gupta > ngu...@vflare.org > diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig > index ac3a31d433b2..01090338fb47 100644 > --- a/drivers/block/zram/Kconfig > +++ b/drivers/block/zram/Kconfig > @@ -26,3 +26,13 @@ config ZRAM_WRITEBACK >/sys/block/zramX/backing_dev. > >See zram.txt for more infomration. > + > +config ZRAM_MEMORY_TRACKING > + bool "Tracking zram block status" bool "Track zram block status" although sometimes it is zRam or zRAM. > + depends on ZRAM && DEBUG_FS > + help > + With this feature, admin can track the state of allocated block blocks > + of zRAM. Admin could see the information via > + /sys/kernel/debug/zram/zramX/block_state. > + > + See zram.txt for more information. See Documentation/blockdev/zram.txt for more information. -- ~Randy
[PATCH 03/22] staging: lustre: obd: change debug reporting in lmv_enqueue()
From: Vitaly Fertman Remove LL_IT2STR(it) from debug macros in lmv_enqueue(). The removal makes it possible to simplify the md_enqueue() functions. Signed-off-by: Vitaly Fertman Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7433 Seagate-bug-id: MRP-3072 MRP-3137 Reviewed-on: http://review.whamcloud.com/17220 Reviewed-by: Andrew Perepechko Reviewed-by: Andriy Skulysh Tested-by: Elena V. Gryaznova Reviewed-by: John L. Hammond Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/lmv/lmv_obd.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c index 7be9310..e1c93cd 100644 --- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c +++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c @@ -1660,15 +1660,14 @@ static int lmv_create(struct obd_export *exp, struct md_op_data *op_data, struct lmv_obd *lmv = >u.lmv; struct lmv_tgt_desc *tgt; - CDEBUG(D_INODE, "ENQUEUE '%s' on " DFID "\n", - LL_IT2STR(it), PFID(_data->op_fid1)); + CDEBUG(D_INODE, "ENQUEUE on " DFID "\n", PFID(_data->op_fid1)); tgt = lmv_locate_mds(lmv, op_data, _data->op_fid1); if (IS_ERR(tgt)) return PTR_ERR(tgt); - CDEBUG(D_INODE, "ENQUEUE '%s' on " DFID " -> mds #%u\n", - LL_IT2STR(it), PFID(_data->op_fid1), tgt->ltd_idx); + CDEBUG(D_INODE, "ENQUEUE on " DFID " -> mds #%u\n", + PFID(_data->op_fid1), tgt->ltd_idx); return md_enqueue(tgt->ltd_exp, einfo, policy, it, op_data, lockh, extra_lock_flags); -- 1.8.3.1
[PATCH 06/22] staging: lustre: llite: Remove filtering of seclabel xattr
From: Robin HumbleThe security.capability xattr is used to implement File Capabilities in recent Linux versions. Capabilities are a fine grained approach to granting executables elevated privileges. eg. /bin/ping can have capabilities cap_net_admin, cap_net_raw+ep instead of being setuid root. This xattr has long been filtered out by llite, initially for stability reasons (b15587), and later over performance concerns as this xattr is read for every file with eg. 'ls --color'. Since LU-2869 xattr's are cached on clients, alleviating most performance concerns. Removing llite's filtering of the security.capability xattr enables using Lustre as a root filesystem, which is used on some large clusters. Signed-off-by: Robin Humble Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9562 Reviewed-on: https://review.whamcloud.com/27292 Reviewed-by: John L. Hammond Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 9 - 1 file changed, 9 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index 2d78432..55a19a5 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -117,11 +117,6 @@ static int xattr_type_filter(struct ll_sb_info *sbi, (handler->flags == XATTR_LUSTRE_T && !strcmp(name, "lov" return 0; - /* b15587: ignore security.capability xattr for now */ - if ((handler->flags == XATTR_SECURITY_T && -!strcmp(name, "capability"))) - return 0; - /* LU-549: Disable security.selinux when selinux is disabled */ if (handler->flags == XATTR_SECURITY_T && !selinux_is_enabled() && strcmp(name, "selinux") == 0) @@ -383,10 +378,6 @@ static int ll_xattr_get_common(const struct xattr_handler *handler, if (rc) return rc; - /* b15587: ignore security.capability xattr for now */ - if ((handler->flags == XATTR_SECURITY_T && !strcmp(name, "capability"))) - return -ENODATA; - /* LU-549: Disable security.selinux when selinux is disabled */ if (handler->flags == XATTR_SECURITY_T && !selinux_is_enabled() && !strcmp(name, "selinux")) -- 1.8.3.1
[PATCH 09/22] staging: lustre: llite: break up ll_setstripe_ea function
From: Bobi JamPlace all the handling of information of trusted.lov that is not stripe related into the new function ll_adjust_lum(). Now ll_setstripe_ea() only handles striping information. Signed-off-by: Bobi Jam Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9484 Reviewed-on: https://review.whamcloud.com/27126 Reviewed-by: Dmitry Eremin Reviewed-by: Niu Yawei Reviewed-by: James Simmons Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 37 +++-- 1 file changed, 24 insertions(+), 13 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index c1600b9..78ce85b 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -186,22 +186,10 @@ static int get_hsm_state(struct inode *inode, u32 *hus_states) return rc; } -static int ll_setstripe_ea(struct dentry *dentry, struct lov_user_md *lump, - size_t size) +static int ll_adjust_lum(struct inode *inode, struct lov_user_md *lump) { - struct inode *inode = d_inode(dentry); int rc = 0; - if (size != 0 && size < sizeof(struct lov_user_md)) - return -EINVAL; - - /* -* It is possible to set an xattr to a "" value of zero size. -* For this case we are going to treat it as a removal. -*/ - if (!size && lump) - lump = NULL; - /* Attributes that are saved via getxattr will always have * the stripe_offset as 0. Instead, the MDS should be * allowed to pick the starting OST index. b=17846 @@ -234,6 +222,29 @@ static int ll_setstripe_ea(struct dentry *dentry, struct lov_user_md *lump, } } + return rc; +} + +static int ll_setstripe_ea(struct dentry *dentry, struct lov_user_md *lump, + size_t size) +{ + struct inode *inode = d_inode(dentry); + int rc = 0; + + if (size != 0 && size < sizeof(struct lov_user_md)) + return -EINVAL; + + /* +* It is possible to set an xattr to a "" value of zero size. +* For this case we are going to treat it as a removal. +*/ + if (!size && lump) + lump = NULL; + + rc = ll_adjust_lum(inode, lump); + if (rc) + return rc; + if (lump && S_ISREG(inode->i_mode)) { __u64 it_flags = FMODE_WRITE; int lum_size; -- 1.8.3.1
[PATCH 06/22] staging: lustre: llite: Remove filtering of seclabel xattr
From: Robin Humble The security.capability xattr is used to implement File Capabilities in recent Linux versions. Capabilities are a fine grained approach to granting executables elevated privileges. eg. /bin/ping can have capabilities cap_net_admin, cap_net_raw+ep instead of being setuid root. This xattr has long been filtered out by llite, initially for stability reasons (b15587), and later over performance concerns as this xattr is read for every file with eg. 'ls --color'. Since LU-2869 xattr's are cached on clients, alleviating most performance concerns. Removing llite's filtering of the security.capability xattr enables using Lustre as a root filesystem, which is used on some large clusters. Signed-off-by: Robin Humble Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9562 Reviewed-on: https://review.whamcloud.com/27292 Reviewed-by: John L. Hammond Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 9 - 1 file changed, 9 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index 2d78432..55a19a5 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -117,11 +117,6 @@ static int xattr_type_filter(struct ll_sb_info *sbi, (handler->flags == XATTR_LUSTRE_T && !strcmp(name, "lov" return 0; - /* b15587: ignore security.capability xattr for now */ - if ((handler->flags == XATTR_SECURITY_T && -!strcmp(name, "capability"))) - return 0; - /* LU-549: Disable security.selinux when selinux is disabled */ if (handler->flags == XATTR_SECURITY_T && !selinux_is_enabled() && strcmp(name, "selinux") == 0) @@ -383,10 +378,6 @@ static int ll_xattr_get_common(const struct xattr_handler *handler, if (rc) return rc; - /* b15587: ignore security.capability xattr for now */ - if ((handler->flags == XATTR_SECURITY_T && !strcmp(name, "capability"))) - return -ENODATA; - /* LU-549: Disable security.selinux when selinux is disabled */ if (handler->flags == XATTR_SECURITY_T && !selinux_is_enabled() && !strcmp(name, "selinux")) -- 1.8.3.1
[PATCH 09/22] staging: lustre: llite: break up ll_setstripe_ea function
From: Bobi Jam Place all the handling of information of trusted.lov that is not stripe related into the new function ll_adjust_lum(). Now ll_setstripe_ea() only handles striping information. Signed-off-by: Bobi Jam Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9484 Reviewed-on: https://review.whamcloud.com/27126 Reviewed-by: Dmitry Eremin Reviewed-by: Niu Yawei Reviewed-by: James Simmons Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 37 +++-- 1 file changed, 24 insertions(+), 13 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index c1600b9..78ce85b 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -186,22 +186,10 @@ static int get_hsm_state(struct inode *inode, u32 *hus_states) return rc; } -static int ll_setstripe_ea(struct dentry *dentry, struct lov_user_md *lump, - size_t size) +static int ll_adjust_lum(struct inode *inode, struct lov_user_md *lump) { - struct inode *inode = d_inode(dentry); int rc = 0; - if (size != 0 && size < sizeof(struct lov_user_md)) - return -EINVAL; - - /* -* It is possible to set an xattr to a "" value of zero size. -* For this case we are going to treat it as a removal. -*/ - if (!size && lump) - lump = NULL; - /* Attributes that are saved via getxattr will always have * the stripe_offset as 0. Instead, the MDS should be * allowed to pick the starting OST index. b=17846 @@ -234,6 +222,29 @@ static int ll_setstripe_ea(struct dentry *dentry, struct lov_user_md *lump, } } + return rc; +} + +static int ll_setstripe_ea(struct dentry *dentry, struct lov_user_md *lump, + size_t size) +{ + struct inode *inode = d_inode(dentry); + int rc = 0; + + if (size != 0 && size < sizeof(struct lov_user_md)) + return -EINVAL; + + /* +* It is possible to set an xattr to a "" value of zero size. +* For this case we are going to treat it as a removal. +*/ + if (!size && lump) + lump = NULL; + + rc = ll_adjust_lum(inode, lump); + if (rc) + return rc; + if (lump && S_ISREG(inode->i_mode)) { __u64 it_flags = FMODE_WRITE; int lum_size; -- 1.8.3.1
[PATCH 14/22] staging: lustre: llite: record in stats attempted removal of lma/link xattr
Keep track of attempted deletions as well as changing of the lma/link xattrs. Signed-off-by: James SimmonsIntel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183 Reviewed-on: https://review.whamcloud.com/27240 Reviewed-by: Dmitry Eremin Reviewed-by: Bob Glossman Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index 4b1e565..3ab7ae0 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -296,7 +296,10 @@ static int ll_xattr_set(const struct xattr_handler *handler, return ll_setstripe_ea(dentry, (struct lov_user_md *)value, size); } else if (!strcmp(name, "lma") || !strcmp(name, "link")) { - ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_SETXATTR, 1); + int op_type = flags == XATTR_REPLACE ? LPROC_LL_REMOVEXATTR : + LPROC_LL_SETXATTR; + + ll_stats_ops_tally(ll_i2sbi(inode), op_type, 1); return 0; } -- 1.8.3.1
[PATCH 14/22] staging: lustre: llite: record in stats attempted removal of lma/link xattr
Keep track of attempted deletions as well as changing of the lma/link xattrs. Signed-off-by: James Simmons Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183 Reviewed-on: https://review.whamcloud.com/27240 Reviewed-by: Dmitry Eremin Reviewed-by: Bob Glossman Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index 4b1e565..3ab7ae0 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -296,7 +296,10 @@ static int ll_xattr_set(const struct xattr_handler *handler, return ll_setstripe_ea(dentry, (struct lov_user_md *)value, size); } else if (!strcmp(name, "lma") || !strcmp(name, "link")) { - ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_SETXATTR, 1); + int op_type = flags == XATTR_REPLACE ? LPROC_LL_REMOVEXATTR : + LPROC_LL_SETXATTR; + + ll_stats_ops_tally(ll_i2sbi(inode), op_type, 1); return 0; } -- 1.8.3.1
[PATCH 12/22] staging: lustre: llite: fix invalid size test in ll_setstripe_ea()
The size check at the start of ll_setstripe_ea() is only valid for a directory. Move that check to the section of code handling the S_ISDIR case. Signed-off-by: James SimmonsIntel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183 Reviewed-on: https://review.whamcloud.com/27240 Reviewed-by: Dmitry Eremin Reviewed-by: Bob Glossman Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index 69c5227..42a6fb4 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -234,9 +234,6 @@ static int ll_setstripe_ea(struct dentry *dentry, struct lov_user_md *lump, struct inode *inode = d_inode(dentry); int rc = 0; - if (size != 0 && size < sizeof(struct lov_user_md)) - return -EINVAL; - /* * It is possible to set an xattr to a "" value of zero size. * For this case we are going to treat it as a removal. @@ -269,6 +266,9 @@ static int ll_setstripe_ea(struct dentry *dentry, struct lov_user_md *lump, if (rc == -EEXIST) rc = 0; } else if (S_ISDIR(inode->i_mode)) { + if (size != 0 && size < sizeof(struct lov_user_md)) + return -EINVAL; + rc = ll_dir_setstripe(inode, lump, 0); } -- 1.8.3.1
[PATCH 08/22] staging: lustre: llite: add simple comment about lustre.lov xattrs
From: Niu YaweiSimple comment added to ll_xattr_set. Signed-off-by: Bobi Jam Signed-off-by: Niu Yawei Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8998 Reviewed-on: https://review.whamcloud.com/24851 Reviewed-by: Andreas Dilger Reviewed-by: Lai Siyao Reviewed-by: Jinshan Xiong Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index 1b462e4..c1600b9 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -264,6 +264,7 @@ static int ll_xattr_set(const struct xattr_handler *handler, CDEBUG(D_VFSTRACE, "VFS Op:inode=" DFID "(%p), xattr %s\n", PFID(ll_inode2fid(inode)), inode, name); + /* lustre/trusted.lov.xxx would be passed through xattr API */ if (!strcmp(name, "lov")) { int op_type = flags == XATTR_REPLACE ? LPROC_LL_REMOVEXATTR : LPROC_LL_SETXATTR; -- 1.8.3.1
[PATCH 11/22] staging: lustre: llite: eat -EEXIST on setting trusted.lov
From: Bobi JamTools like rsync, tar, cp may copy and restore the xattrs on a file. The client previously ignored the setting of trusted.lov/lustre.lov if the layout had already been specified, to avoid causing these tools to fail for no reason. For PFL files we still need to silently eat -EEXIST on setting these attributes to avoid problems. Signed-off-by: Bobi Jam Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9484 Reviewed-on: https://review.whamcloud.com/27126 Reviewed-by: Dmitry Eremin Reviewed-by: Niu Yawei Reviewed-by: James Simmons Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index 56ac07e..69c5227 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -254,12 +254,20 @@ static int ll_setstripe_ea(struct dentry *dentry, struct lov_user_md *lump, lum_size = ll_lov_user_md_size(lump); if (lum_size < 0 || size < lum_size) - return 0; /* b=10667: ignore error */ + return -ERANGE; rc = ll_lov_setstripe_ea_info(inode, dentry, it_flags, lump, lum_size); - /* b=10667: rc always be 0 here for now */ - rc = 0; + /** +* b=10667: ignore -EEXIST. +* Silently eat error on setting trusted.lov/lustre.lov +* attribute for platforms that added the default option +* to copy all attributes in 'cp' command. Both rsync and +* tar --xattrs also will try to set LOVEA for existing +* files. +*/ + if (rc == -EEXIST) + rc = 0; } else if (S_ISDIR(inode->i_mode)) { rc = ll_dir_setstripe(inode, lump, 0); } -- 1.8.3.1
[PATCH 12/22] staging: lustre: llite: fix invalid size test in ll_setstripe_ea()
The size check at the start of ll_setstripe_ea() is only valid for a directory. Move that check to the section of code handling the S_ISDIR case. Signed-off-by: James Simmons Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183 Reviewed-on: https://review.whamcloud.com/27240 Reviewed-by: Dmitry Eremin Reviewed-by: Bob Glossman Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index 69c5227..42a6fb4 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -234,9 +234,6 @@ static int ll_setstripe_ea(struct dentry *dentry, struct lov_user_md *lump, struct inode *inode = d_inode(dentry); int rc = 0; - if (size != 0 && size < sizeof(struct lov_user_md)) - return -EINVAL; - /* * It is possible to set an xattr to a "" value of zero size. * For this case we are going to treat it as a removal. @@ -269,6 +266,9 @@ static int ll_setstripe_ea(struct dentry *dentry, struct lov_user_md *lump, if (rc == -EEXIST) rc = 0; } else if (S_ISDIR(inode->i_mode)) { + if (size != 0 && size < sizeof(struct lov_user_md)) + return -EINVAL; + rc = ll_dir_setstripe(inode, lump, 0); } -- 1.8.3.1
[PATCH 08/22] staging: lustre: llite: add simple comment about lustre.lov xattrs
From: Niu Yawei Simple comment added to ll_xattr_set. Signed-off-by: Bobi Jam Signed-off-by: Niu Yawei Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8998 Reviewed-on: https://review.whamcloud.com/24851 Reviewed-by: Andreas Dilger Reviewed-by: Lai Siyao Reviewed-by: Jinshan Xiong Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index 1b462e4..c1600b9 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -264,6 +264,7 @@ static int ll_xattr_set(const struct xattr_handler *handler, CDEBUG(D_VFSTRACE, "VFS Op:inode=" DFID "(%p), xattr %s\n", PFID(ll_inode2fid(inode)), inode, name); + /* lustre/trusted.lov.xxx would be passed through xattr API */ if (!strcmp(name, "lov")) { int op_type = flags == XATTR_REPLACE ? LPROC_LL_REMOVEXATTR : LPROC_LL_SETXATTR; -- 1.8.3.1
[PATCH 11/22] staging: lustre: llite: eat -EEXIST on setting trusted.lov
From: Bobi Jam Tools like rsync, tar, cp may copy and restore the xattrs on a file. The client previously ignored the setting of trusted.lov/lustre.lov if the layout had already been specified, to avoid causing these tools to fail for no reason. For PFL files we still need to silently eat -EEXIST on setting these attributes to avoid problems. Signed-off-by: Bobi Jam Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9484 Reviewed-on: https://review.whamcloud.com/27126 Reviewed-by: Dmitry Eremin Reviewed-by: Niu Yawei Reviewed-by: James Simmons Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index 56ac07e..69c5227 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -254,12 +254,20 @@ static int ll_setstripe_ea(struct dentry *dentry, struct lov_user_md *lump, lum_size = ll_lov_user_md_size(lump); if (lum_size < 0 || size < lum_size) - return 0; /* b=10667: ignore error */ + return -ERANGE; rc = ll_lov_setstripe_ea_info(inode, dentry, it_flags, lump, lum_size); - /* b=10667: rc always be 0 here for now */ - rc = 0; + /** +* b=10667: ignore -EEXIST. +* Silently eat error on setting trusted.lov/lustre.lov +* attribute for platforms that added the default option +* to copy all attributes in 'cp' command. Both rsync and +* tar --xattrs also will try to set LOVEA for existing +* files. +*/ + if (rc == -EEXIST) + rc = 0; } else if (S_ISDIR(inode->i_mode)) { rc = ll_dir_setstripe(inode, lump, 0); } -- 1.8.3.1
[PATCH 13/22] staging: lustre: llite: remove newline in fullname strings
In creating the full name of a xattr a new line was added that was seen by the remote MDS server which confused it. Remove the newline. Signed-off-by: James SimmonsIntel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183 Reviewed-on: https://review.whamcloud.com/27240 Reviewed-by: Dmitry Eremin Reviewed-by: Bob Glossman Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index 42a6fb4..4b1e565 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -136,7 +136,7 @@ static int xattr_type_filter(struct ll_sb_info *sbi, return -EPERM; } - fullname = kasprintf(GFP_KERNEL, "%s%s\n", handler->prefix, name); + fullname = kasprintf(GFP_KERNEL, "%s%s", handler->prefix, name); if (!fullname) return -ENOMEM; rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode), @@ -435,7 +435,7 @@ static int ll_xattr_get_common(const struct xattr_handler *handler, if (handler->flags == XATTR_ACL_DEFAULT_T && !S_ISDIR(inode->i_mode)) return -ENODATA; #endif - fullname = kasprintf(GFP_KERNEL, "%s%s\n", handler->prefix, name); + fullname = kasprintf(GFP_KERNEL, "%s%s", handler->prefix, name); if (!fullname) return -ENOMEM; rc = ll_xattr_list(inode, fullname, handler->flags, buffer, size, -- 1.8.3.1
[PATCH 13/22] staging: lustre: llite: remove newline in fullname strings
In creating the full name of a xattr a new line was added that was seen by the remote MDS server which confused it. Remove the newline. Signed-off-by: James Simmons Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183 Reviewed-on: https://review.whamcloud.com/27240 Reviewed-by: Dmitry Eremin Reviewed-by: Bob Glossman Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index 42a6fb4..4b1e565 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -136,7 +136,7 @@ static int xattr_type_filter(struct ll_sb_info *sbi, return -EPERM; } - fullname = kasprintf(GFP_KERNEL, "%s%s\n", handler->prefix, name); + fullname = kasprintf(GFP_KERNEL, "%s%s", handler->prefix, name); if (!fullname) return -ENOMEM; rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode), @@ -435,7 +435,7 @@ static int ll_xattr_get_common(const struct xattr_handler *handler, if (handler->flags == XATTR_ACL_DEFAULT_T && !S_ISDIR(inode->i_mode)) return -ENODATA; #endif - fullname = kasprintf(GFP_KERNEL, "%s%s\n", handler->prefix, name); + fullname = kasprintf(GFP_KERNEL, "%s%s", handler->prefix, name); if (!fullname) return -ENOMEM; rc = ll_xattr_list(inode, fullname, handler->flags, buffer, size, -- 1.8.3.1
[PATCH 19/22] staging: lustre: llite: add support set_acl method in inode operations
From: Dmitry EreminLinux kernel v3.14 adds set_acl method to inode operations. This patch adds support to Lustre for proper acl management. Signed-off-by: Dmitry Eremin Signed-off-by: John L. Hammond Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183 Reviewed-on: https://review.whamcloud.com/25965 Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10541 Reviewed-on: https://review.whamcloud.com/ Reviewed-by: Bob Glossman Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/file.c | 67 ++ .../staging/lustre/lustre/llite/llite_internal.h | 4 ++ drivers/staging/lustre/lustre/llite/namei.c| 10 +++- 3 files changed, 79 insertions(+), 2 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c index 0026fde..35f5bda 100644 --- a/drivers/staging/lustre/lustre/llite/file.c +++ b/drivers/staging/lustre/lustre/llite/file.c @@ -3030,6 +3030,7 @@ static int ll_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, return rc; } +#ifdef CONFIG_FS_POSIX_ACL struct posix_acl *ll_get_acl(struct inode *inode, int type) { struct ll_inode_info *lli = ll_i2info(inode); @@ -3043,6 +3044,69 @@ struct posix_acl *ll_get_acl(struct inode *inode, int type) return acl; } +int ll_set_acl(struct inode *inode, struct posix_acl *acl, int type) +{ + struct ll_sb_info *sbi = ll_i2sbi(inode); + struct ptlrpc_request *req = NULL; + const char *name = NULL; + size_t value_size = 0; + char *value = NULL; + int rc; + + switch (type) { + case ACL_TYPE_ACCESS: + name = XATTR_NAME_POSIX_ACL_ACCESS; + if (acl) { + rc = posix_acl_update_mode(inode, >i_mode, ); + if (rc) + goto out; + } + + break; + + case ACL_TYPE_DEFAULT: + name = XATTR_NAME_POSIX_ACL_DEFAULT; + if (!S_ISDIR(inode->i_mode)) { + rc = acl ? -EACCES : 0; + goto out; + } + + break; + + default: + rc = -EINVAL; + goto out; + } + + if (acl) { + value_size = posix_acl_xattr_size(acl->a_count); + value = kmalloc(value_size, GFP_NOFS); + if (!value) { + rc = -ENOMEM; + goto out; + } + + rc = posix_acl_to_xattr(_user_ns, acl, value, value_size); + if (rc < 0) + goto out_value; + } + + rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode), +value ? OBD_MD_FLXATTR : OBD_MD_FLXATTRRM, +name, value, value_size, 0, 0, 0, ); + + ptlrpc_req_finished(req); +out_value: + kfree(value); +out: + if (!rc) + set_cached_acl(inode, type, acl); + else + forget_cached_acl(inode, type); + return rc; +} +#endif /* CONFIG_FS_POSIX_ACL */ + int ll_inode_permission(struct inode *inode, int mask) { struct ll_sb_info *sbi; @@ -3164,7 +3228,10 @@ int ll_inode_permission(struct inode *inode, int mask) .permission = ll_inode_permission, .listxattr = ll_listxattr, .fiemap = ll_fiemap, +#ifdef CONFIG_FS_POSIX_ACL .get_acl= ll_get_acl, + .set_acl= ll_set_acl, +#endif }; /* dynamic ioctl number support routines */ diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h index 6504850..2280327 100644 --- a/drivers/staging/lustre/lustre/llite/llite_internal.h +++ b/drivers/staging/lustre/lustre/llite/llite_internal.h @@ -754,7 +754,11 @@ enum ldlm_mode ll_take_md_lock(struct inode *inode, __u64 bits, int ll_md_real_close(struct inode *inode, fmode_t fmode); int ll_getattr(const struct path *path, struct kstat *stat, u32 request_mask, unsigned int flags); +#ifdef CONFIG_FS_POSIX_ACL struct posix_acl *ll_get_acl(struct inode *inode, int type); +int ll_set_acl(struct inode *inode, struct posix_acl *acl, int type); +#endif /* CONFIG_FS_POSIX_ACL */ + int ll_migrate(struct inode *parent, struct file *file, int mdtidx, const char *name, int namelen); int ll_get_fid_by_name(struct inode *parent, const char *name, diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c index 6c9ec46..d7c4c58 100644 --- a/drivers/staging/lustre/lustre/llite/namei.c +++ b/drivers/staging/lustre/lustre/llite/namei.c @@ -1190,7 +1190,10 @@ static int
[PATCH 19/22] staging: lustre: llite: add support set_acl method in inode operations
From: Dmitry Eremin Linux kernel v3.14 adds set_acl method to inode operations. This patch adds support to Lustre for proper acl management. Signed-off-by: Dmitry Eremin Signed-off-by: John L. Hammond Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183 Reviewed-on: https://review.whamcloud.com/25965 Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10541 Reviewed-on: https://review.whamcloud.com/ Reviewed-by: Bob Glossman Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/file.c | 67 ++ .../staging/lustre/lustre/llite/llite_internal.h | 4 ++ drivers/staging/lustre/lustre/llite/namei.c| 10 +++- 3 files changed, 79 insertions(+), 2 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c index 0026fde..35f5bda 100644 --- a/drivers/staging/lustre/lustre/llite/file.c +++ b/drivers/staging/lustre/lustre/llite/file.c @@ -3030,6 +3030,7 @@ static int ll_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, return rc; } +#ifdef CONFIG_FS_POSIX_ACL struct posix_acl *ll_get_acl(struct inode *inode, int type) { struct ll_inode_info *lli = ll_i2info(inode); @@ -3043,6 +3044,69 @@ struct posix_acl *ll_get_acl(struct inode *inode, int type) return acl; } +int ll_set_acl(struct inode *inode, struct posix_acl *acl, int type) +{ + struct ll_sb_info *sbi = ll_i2sbi(inode); + struct ptlrpc_request *req = NULL; + const char *name = NULL; + size_t value_size = 0; + char *value = NULL; + int rc; + + switch (type) { + case ACL_TYPE_ACCESS: + name = XATTR_NAME_POSIX_ACL_ACCESS; + if (acl) { + rc = posix_acl_update_mode(inode, >i_mode, ); + if (rc) + goto out; + } + + break; + + case ACL_TYPE_DEFAULT: + name = XATTR_NAME_POSIX_ACL_DEFAULT; + if (!S_ISDIR(inode->i_mode)) { + rc = acl ? -EACCES : 0; + goto out; + } + + break; + + default: + rc = -EINVAL; + goto out; + } + + if (acl) { + value_size = posix_acl_xattr_size(acl->a_count); + value = kmalloc(value_size, GFP_NOFS); + if (!value) { + rc = -ENOMEM; + goto out; + } + + rc = posix_acl_to_xattr(_user_ns, acl, value, value_size); + if (rc < 0) + goto out_value; + } + + rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode), +value ? OBD_MD_FLXATTR : OBD_MD_FLXATTRRM, +name, value, value_size, 0, 0, 0, ); + + ptlrpc_req_finished(req); +out_value: + kfree(value); +out: + if (!rc) + set_cached_acl(inode, type, acl); + else + forget_cached_acl(inode, type); + return rc; +} +#endif /* CONFIG_FS_POSIX_ACL */ + int ll_inode_permission(struct inode *inode, int mask) { struct ll_sb_info *sbi; @@ -3164,7 +3228,10 @@ int ll_inode_permission(struct inode *inode, int mask) .permission = ll_inode_permission, .listxattr = ll_listxattr, .fiemap = ll_fiemap, +#ifdef CONFIG_FS_POSIX_ACL .get_acl= ll_get_acl, + .set_acl= ll_set_acl, +#endif }; /* dynamic ioctl number support routines */ diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h index 6504850..2280327 100644 --- a/drivers/staging/lustre/lustre/llite/llite_internal.h +++ b/drivers/staging/lustre/lustre/llite/llite_internal.h @@ -754,7 +754,11 @@ enum ldlm_mode ll_take_md_lock(struct inode *inode, __u64 bits, int ll_md_real_close(struct inode *inode, fmode_t fmode); int ll_getattr(const struct path *path, struct kstat *stat, u32 request_mask, unsigned int flags); +#ifdef CONFIG_FS_POSIX_ACL struct posix_acl *ll_get_acl(struct inode *inode, int type); +int ll_set_acl(struct inode *inode, struct posix_acl *acl, int type); +#endif /* CONFIG_FS_POSIX_ACL */ + int ll_migrate(struct inode *parent, struct file *file, int mdtidx, const char *name, int namelen); int ll_get_fid_by_name(struct inode *parent, const char *name, diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c index 6c9ec46..d7c4c58 100644 --- a/drivers/staging/lustre/lustre/llite/namei.c +++ b/drivers/staging/lustre/lustre/llite/namei.c @@ -1190,7 +1190,10 @@ static int ll_rename(struct inode *src, struct dentry *src_dchild, .getattr= ll_getattr, .permission = ll_inode_permission, .listxattr
[PATCH 20/22] staging: lustre: llite: use xattr_handler name for ACLs
From: "John L. Hammond"If struct xattr_handler has a name member then use it (rather than prefix) for the ACL xattrs. This avoids a bug where ACL operations failed for some kernels. Signed-off-by: John L. Hammond Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10785 Reviewed-on: https://review.whamcloud.com/ Reviewed-by: Dmitry Eremin Reviewed-by: James Simmons Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index d08bf1e..e835c8e 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -46,15 +46,16 @@ const struct xattr_handler *get_xattr_type(const char *name) { - int i = 0; + int i; - while (ll_xattr_handlers[i]) { - size_t len = strlen(ll_xattr_handlers[i]->prefix); + for (i = 0; ll_xattr_handlers[i]; i++) { + const char *prefix = xattr_prefix(ll_xattr_handlers[i]); + size_t prefix_len = strlen(prefix); - if (!strncmp(ll_xattr_handlers[i]->prefix, name, len)) + if (!strncmp(prefix, name, prefix_len)) return ll_xattr_handlers[i]; - i++; } + return NULL; } @@ -627,14 +628,14 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, size_t size) }; static const struct xattr_handler ll_acl_access_xattr_handler = { - .prefix = XATTR_NAME_POSIX_ACL_ACCESS, + .name = XATTR_NAME_POSIX_ACL_ACCESS, .flags = XATTR_ACL_ACCESS_T, .get = ll_xattr_get_common, .set = ll_xattr_set_common, }; static const struct xattr_handler ll_acl_default_xattr_handler = { - .prefix = XATTR_NAME_POSIX_ACL_DEFAULT, + .name = XATTR_NAME_POSIX_ACL_DEFAULT, .flags = XATTR_ACL_DEFAULT_T, .get = ll_xattr_get_common, .set = ll_xattr_set_common, -- 1.8.3.1
[PATCH 20/22] staging: lustre: llite: use xattr_handler name for ACLs
From: "John L. Hammond" If struct xattr_handler has a name member then use it (rather than prefix) for the ACL xattrs. This avoids a bug where ACL operations failed for some kernels. Signed-off-by: John L. Hammond Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10785 Reviewed-on: https://review.whamcloud.com/ Reviewed-by: Dmitry Eremin Reviewed-by: James Simmons Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index d08bf1e..e835c8e 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -46,15 +46,16 @@ const struct xattr_handler *get_xattr_type(const char *name) { - int i = 0; + int i; - while (ll_xattr_handlers[i]) { - size_t len = strlen(ll_xattr_handlers[i]->prefix); + for (i = 0; ll_xattr_handlers[i]; i++) { + const char *prefix = xattr_prefix(ll_xattr_handlers[i]); + size_t prefix_len = strlen(prefix); - if (!strncmp(ll_xattr_handlers[i]->prefix, name, len)) + if (!strncmp(prefix, name, prefix_len)) return ll_xattr_handlers[i]; - i++; } + return NULL; } @@ -627,14 +628,14 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, size_t size) }; static const struct xattr_handler ll_acl_access_xattr_handler = { - .prefix = XATTR_NAME_POSIX_ACL_ACCESS, + .name = XATTR_NAME_POSIX_ACL_ACCESS, .flags = XATTR_ACL_ACCESS_T, .get = ll_xattr_get_common, .set = ll_xattr_set_common, }; static const struct xattr_handler ll_acl_default_xattr_handler = { - .prefix = XATTR_NAME_POSIX_ACL_DEFAULT, + .name = XATTR_NAME_POSIX_ACL_DEFAULT, .flags = XATTR_ACL_DEFAULT_T, .get = ll_xattr_get_common, .set = ll_xattr_set_common, -- 1.8.3.1
[PATCH 18/22] staging: lustre: llite: style changes in xattr.c
Small style changes to match more the kernel code standard and it make it more readable. Signed-off-by: James SimmonsIntel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183 Reviewed-on: https://review.whamcloud.com/27240 Reviewed-by: Dmitry Eremin Reviewed-by: Bob Glossman Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index 835d00f..d08bf1e 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -81,11 +81,10 @@ static int xattr_type_filter(struct ll_sb_info *sbi, return 0; } -static int -ll_xattr_set_common(const struct xattr_handler *handler, - struct dentry *dentry, struct inode *inode, - const char *name, const void *value, size_t size, - int flags) +static int ll_xattr_set_common(const struct xattr_handler *handler, + struct dentry *dentry, struct inode *inode, + const char *name, const void *value, size_t size, + int flags) { struct ll_sb_info *sbi = ll_i2sbi(inode); struct ptlrpc_request *req = NULL; @@ -139,9 +138,9 @@ static int xattr_type_filter(struct ll_sb_info *sbi, fullname = kasprintf(GFP_KERNEL, "%s%s", handler->prefix, name); if (!fullname) return -ENOMEM; - rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode), -valid, fullname, pv, size, 0, flags, -ll_i2suppgid(inode), ); + + rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode), valid, fullname, +pv, size, 0, flags, ll_i2suppgid(inode), ); kfree(fullname); if (rc) { if (rc == -EOPNOTSUPP && handler->flags == XATTR_USER_T) { @@ -307,9 +306,8 @@ static int ll_xattr_set(const struct xattr_handler *handler, flags); } -int -ll_xattr_list(struct inode *inode, const char *name, int type, void *buffer, - size_t size, u64 valid) +int ll_xattr_list(struct inode *inode, const char *name, int type, void *buffer, + size_t size, u64 valid) { struct ll_inode_info *lli = ll_i2info(inode); struct ll_sb_info *sbi = ll_i2sbi(inode); @@ -439,6 +437,7 @@ static int ll_xattr_get_common(const struct xattr_handler *handler, fullname = kasprintf(GFP_KERNEL, "%s%s", handler->prefix, name); if (!fullname) return -ENOMEM; + rc = ll_xattr_list(inode, fullname, handler->flags, buffer, size, OBD_MD_FLXATTR); kfree(fullname); @@ -562,6 +561,7 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, size_t size) OBD_MD_FLXATTRLS); if (rc < 0) return rc; + /* * If we're being called to get the size of the xattr list * (size == 0) then just assume that a lustre.lov xattr -- 1.8.3.1
[PATCH 16/22] staging: lustre: llite: use proper types in the xattr code
Convert __uXX types to uXX types since this is kernel code. The function ll_lov_user_md_size() returns ssize_t so change lum_size from int to ssize_t. Signed-off-by: James SimmonsIntel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183 Reviewed-on: https://review.whamcloud.com/27240 Reviewed-by: Dmitry Eremin Reviewed-by: Bob Glossman Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index 147ffcc..d6cee3b 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -91,7 +91,7 @@ static int xattr_type_filter(struct ll_sb_info *sbi, struct ptlrpc_request *req = NULL; const char *pv = value; char *fullname; - __u64 valid; + u64 valid; int rc; if (flags == XATTR_REPLACE) { @@ -246,8 +246,8 @@ static int ll_setstripe_ea(struct dentry *dentry, struct lov_user_md *lump, return rc; if (lump && S_ISREG(inode->i_mode)) { - __u64 it_flags = FMODE_WRITE; - int lum_size; + u64 it_flags = FMODE_WRITE; + ssize_t lum_size; lum_size = ll_lov_user_md_size(lump); if (lum_size < 0 || size < lum_size) @@ -309,7 +309,7 @@ static int ll_xattr_set(const struct xattr_handler *handler, int ll_xattr_list(struct inode *inode, const char *name, int type, void *buffer, - size_t size, __u64 valid) + size_t size, u64 valid) { struct ll_inode_info *lli = ll_i2info(inode); struct ll_sb_info *sbi = ll_i2sbi(inode); -- 1.8.3.1
[PATCH 18/22] staging: lustre: llite: style changes in xattr.c
Small style changes to match more the kernel code standard and it make it more readable. Signed-off-by: James Simmons Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183 Reviewed-on: https://review.whamcloud.com/27240 Reviewed-by: Dmitry Eremin Reviewed-by: Bob Glossman Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index 835d00f..d08bf1e 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -81,11 +81,10 @@ static int xattr_type_filter(struct ll_sb_info *sbi, return 0; } -static int -ll_xattr_set_common(const struct xattr_handler *handler, - struct dentry *dentry, struct inode *inode, - const char *name, const void *value, size_t size, - int flags) +static int ll_xattr_set_common(const struct xattr_handler *handler, + struct dentry *dentry, struct inode *inode, + const char *name, const void *value, size_t size, + int flags) { struct ll_sb_info *sbi = ll_i2sbi(inode); struct ptlrpc_request *req = NULL; @@ -139,9 +138,9 @@ static int xattr_type_filter(struct ll_sb_info *sbi, fullname = kasprintf(GFP_KERNEL, "%s%s", handler->prefix, name); if (!fullname) return -ENOMEM; - rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode), -valid, fullname, pv, size, 0, flags, -ll_i2suppgid(inode), ); + + rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode), valid, fullname, +pv, size, 0, flags, ll_i2suppgid(inode), ); kfree(fullname); if (rc) { if (rc == -EOPNOTSUPP && handler->flags == XATTR_USER_T) { @@ -307,9 +306,8 @@ static int ll_xattr_set(const struct xattr_handler *handler, flags); } -int -ll_xattr_list(struct inode *inode, const char *name, int type, void *buffer, - size_t size, u64 valid) +int ll_xattr_list(struct inode *inode, const char *name, int type, void *buffer, + size_t size, u64 valid) { struct ll_inode_info *lli = ll_i2info(inode); struct ll_sb_info *sbi = ll_i2sbi(inode); @@ -439,6 +437,7 @@ static int ll_xattr_get_common(const struct xattr_handler *handler, fullname = kasprintf(GFP_KERNEL, "%s%s", handler->prefix, name); if (!fullname) return -ENOMEM; + rc = ll_xattr_list(inode, fullname, handler->flags, buffer, size, OBD_MD_FLXATTR); kfree(fullname); @@ -562,6 +561,7 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, size_t size) OBD_MD_FLXATTRLS); if (rc < 0) return rc; + /* * If we're being called to get the size of the xattr list * (size == 0) then just assume that a lustre.lov xattr -- 1.8.3.1
[PATCH 16/22] staging: lustre: llite: use proper types in the xattr code
Convert __uXX types to uXX types since this is kernel code. The function ll_lov_user_md_size() returns ssize_t so change lum_size from int to ssize_t. Signed-off-by: James Simmons Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183 Reviewed-on: https://review.whamcloud.com/27240 Reviewed-by: Dmitry Eremin Reviewed-by: Bob Glossman Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index 147ffcc..d6cee3b 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -91,7 +91,7 @@ static int xattr_type_filter(struct ll_sb_info *sbi, struct ptlrpc_request *req = NULL; const char *pv = value; char *fullname; - __u64 valid; + u64 valid; int rc; if (flags == XATTR_REPLACE) { @@ -246,8 +246,8 @@ static int ll_setstripe_ea(struct dentry *dentry, struct lov_user_md *lump, return rc; if (lump && S_ISREG(inode->i_mode)) { - __u64 it_flags = FMODE_WRITE; - int lum_size; + u64 it_flags = FMODE_WRITE; + ssize_t lum_size; lum_size = ll_lov_user_md_size(lump); if (lum_size < 0 || size < lum_size) @@ -309,7 +309,7 @@ static int ll_xattr_set(const struct xattr_handler *handler, int ll_xattr_list(struct inode *inode, const char *name, int type, void *buffer, - size_t size, __u64 valid) + size_t size, u64 valid) { struct ll_inode_info *lli = ll_i2info(inode); struct ll_sb_info *sbi = ll_i2sbi(inode); -- 1.8.3.1
[PATCH 17/22] staging: lustre: llite: cleanup xattr code comments
Add proper punctuation to the comments. Change buf_size to size for comment in ll_listxattr() since buf_size doesn't exit which will confuse someone reading the code. Signed-off-by: James SimmonsIntel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183 Reviewed-on: https://review.whamcloud.com/27240 Reviewed-by: Dmitry Eremin Reviewed-by: Bob Glossman Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index d6cee3b..835d00f 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -564,7 +564,7 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, size_t size) return rc; /* * If we're being called to get the size of the xattr list -* (buf_size == 0) then just assume that a lustre.lov xattr +* (size == 0) then just assume that a lustre.lov xattr * exists. */ if (!size) @@ -577,14 +577,14 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, size_t size) len = strnlen(xattr_name, rem - 1) + 1; rem -= len; if (!xattr_type_filter(sbi, get_xattr_type(xattr_name))) { - /* Skip OK xattr type leave it in buffer */ + /* Skip OK xattr type, leave it in buffer. */ xattr_name += len; continue; } /* * Move up remaining xattrs in buffer -* removing the xattr that is not OK +* removing the xattr that is not OK. */ memmove(xattr_name, xattr_name + len, rem); rc -= len; -- 1.8.3.1
[PATCH 17/22] staging: lustre: llite: cleanup xattr code comments
Add proper punctuation to the comments. Change buf_size to size for comment in ll_listxattr() since buf_size doesn't exit which will confuse someone reading the code. Signed-off-by: James Simmons Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183 Reviewed-on: https://review.whamcloud.com/27240 Reviewed-by: Dmitry Eremin Reviewed-by: Bob Glossman Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index d6cee3b..835d00f 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -564,7 +564,7 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, size_t size) return rc; /* * If we're being called to get the size of the xattr list -* (buf_size == 0) then just assume that a lustre.lov xattr +* (size == 0) then just assume that a lustre.lov xattr * exists. */ if (!size) @@ -577,14 +577,14 @@ ssize_t ll_listxattr(struct dentry *dentry, char *buffer, size_t size) len = strnlen(xattr_name, rem - 1) + 1; rem -= len; if (!xattr_type_filter(sbi, get_xattr_type(xattr_name))) { - /* Skip OK xattr type leave it in buffer */ + /* Skip OK xattr type, leave it in buffer. */ xattr_name += len; continue; } /* * Move up remaining xattrs in buffer -* removing the xattr that is not OK +* removing the xattr that is not OK. */ memmove(xattr_name, xattr_name + len, rem); rc -= len; -- 1.8.3.1
[PATCH 21/22] staging: lustre: llite: correct removexattr detection
In ll_xattr_set_common() detect the removexattr() case correctly by testing for a NULL value as well as XATTR_REPLACE. Signed-off-by: John L. HammondIntel-bug-id: https://jira.hpdd.intel.com/browse/LU-10787 Reviewed-on: https://review.whamcloud.com/ Reviewed-by: Dmitry Eremin Reviewed-by: James Simmons Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index e835c8e..1a597a6 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -94,7 +94,11 @@ static int ll_xattr_set_common(const struct xattr_handler *handler, u64 valid; int rc; - if (flags == XATTR_REPLACE) { + /* When setxattr() is called with a size of 0 the value is +* unconditionally replaced by "". When removexattr() is +* called we get a NULL value and XATTR_REPLACE for flags. +*/ + if (!value && flags == XATTR_REPLACE) { ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_REMOVEXATTR, 1); valid = OBD_MD_FLXATTRRM; } else { -- 1.8.3.1
[PATCH 21/22] staging: lustre: llite: correct removexattr detection
In ll_xattr_set_common() detect the removexattr() case correctly by testing for a NULL value as well as XATTR_REPLACE. Signed-off-by: John L. Hammond Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10787 Reviewed-on: https://review.whamcloud.com/ Reviewed-by: Dmitry Eremin Reviewed-by: James Simmons Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index e835c8e..1a597a6 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -94,7 +94,11 @@ static int ll_xattr_set_common(const struct xattr_handler *handler, u64 valid; int rc; - if (flags == XATTR_REPLACE) { + /* When setxattr() is called with a size of 0 the value is +* unconditionally replaced by "". When removexattr() is +* called we get a NULL value and XATTR_REPLACE for flags. +*/ + if (!value && flags == XATTR_REPLACE) { ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_REMOVEXATTR, 1); valid = OBD_MD_FLXATTRRM; } else { -- 1.8.3.1
[PATCH 22/22] staging: lustre: llite: remove unused parameters from md_{get,set}xattr()
From: "John L. Hammond"md_getxattr() and md_setxattr() each have several unused parameters. Remove them and improve the naming or remaining parameters. Signed-off-by: John L. Hammond Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10792 Reviewed-on: https://review.whamcloud.com/ Reviewed-by: Dmitry Eremin Reviewed-by: James Simmons Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/include/obd.h | 7 ++--- drivers/staging/lustre/lustre/include/obd_class.h | 21 ++ drivers/staging/lustre/lustre/llite/file.c| 5 ++-- drivers/staging/lustre/lustre/llite/xattr.c | 6 ++-- drivers/staging/lustre/lustre/lmv/lmv_obd.c | 22 +++ drivers/staging/lustre/lustre/mdc/mdc_request.c | 34 +-- 6 files changed, 46 insertions(+), 49 deletions(-) diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h index 48cf7ab..0f9e5dc 100644 --- a/drivers/staging/lustre/lustre/include/obd.h +++ b/drivers/staging/lustre/lustre/include/obd.h @@ -935,12 +935,11 @@ struct md_ops { struct ptlrpc_request **); int (*setxattr)(struct obd_export *, const struct lu_fid *, - u64, const char *, const char *, int, int, int, __u32, - struct ptlrpc_request **); + u64, const char *, const void *, size_t, unsigned int, + u32, struct ptlrpc_request **); int (*getxattr)(struct obd_export *, const struct lu_fid *, - u64, const char *, const char *, int, int, int, - struct ptlrpc_request **); + u64, const char *, size_t, struct ptlrpc_request **); int (*init_ea_size)(struct obd_export *, u32, u32); diff --git a/drivers/staging/lustre/lustre/include/obd_class.h b/drivers/staging/lustre/lustre/include/obd_class.h index a76f016..0081578 100644 --- a/drivers/staging/lustre/lustre/include/obd_class.h +++ b/drivers/staging/lustre/lustre/include/obd_class.h @@ -1385,29 +1385,26 @@ static inline int md_merge_attr(struct obd_export *exp, } static inline int md_setxattr(struct obd_export *exp, const struct lu_fid *fid, - u64 valid, const char *name, - const char *input, int input_size, - int output_size, int flags, __u32 suppgid, + u64 obd_md_valid, const char *name, + const char *value, size_t value_size, + unsigned int xattr_flags, u32 suppgid, struct ptlrpc_request **request) { EXP_CHECK_MD_OP(exp, setxattr); EXP_MD_COUNTER_INCREMENT(exp, setxattr); - return MDP(exp->exp_obd, setxattr)(exp, fid, valid, name, input, - input_size, output_size, flags, + return MDP(exp->exp_obd, setxattr)(exp, fid, obd_md_valid, name, + value, value_size, xattr_flags, suppgid, request); } static inline int md_getxattr(struct obd_export *exp, const struct lu_fid *fid, - u64 valid, const char *name, - const char *input, int input_size, - int output_size, int flags, - struct ptlrpc_request **request) + u64 obd_md_valid, const char *name, + size_t buf_size, struct ptlrpc_request **req) { EXP_CHECK_MD_OP(exp, getxattr); EXP_MD_COUNTER_INCREMENT(exp, getxattr); - return MDP(exp->exp_obd, getxattr)(exp, fid, valid, name, input, - input_size, output_size, flags, - request); + return MDP(exp->exp_obd, getxattr)(exp, fid, obd_md_valid, name, + buf_size, req); } static inline int md_set_open_replay_data(struct obd_export *exp, diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c index 35f5bda..9197891 100644 --- a/drivers/staging/lustre/lustre/llite/file.c +++ b/drivers/staging/lustre/lustre/llite/file.c @@ -3093,7 +3093,7 @@ int ll_set_acl(struct inode *inode, struct posix_acl *acl, int type) rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode), value ? OBD_MD_FLXATTR : OBD_MD_FLXATTRRM, -name, value, value_size, 0, 0, 0, ); +name, value, value_size, 0, 0, ); ptlrpc_req_finished(req); out_value: @@ -3405,8 +3405,7 @@ static int ll_layout_fetch(struct inode *inode, struct ldlm_lock *lock) rc =
[PATCH 22/22] staging: lustre: llite: remove unused parameters from md_{get,set}xattr()
From: "John L. Hammond" md_getxattr() and md_setxattr() each have several unused parameters. Remove them and improve the naming or remaining parameters. Signed-off-by: John L. Hammond Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10792 Reviewed-on: https://review.whamcloud.com/ Reviewed-by: Dmitry Eremin Reviewed-by: James Simmons Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/include/obd.h | 7 ++--- drivers/staging/lustre/lustre/include/obd_class.h | 21 ++ drivers/staging/lustre/lustre/llite/file.c| 5 ++-- drivers/staging/lustre/lustre/llite/xattr.c | 6 ++-- drivers/staging/lustre/lustre/lmv/lmv_obd.c | 22 +++ drivers/staging/lustre/lustre/mdc/mdc_request.c | 34 +-- 6 files changed, 46 insertions(+), 49 deletions(-) diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h index 48cf7ab..0f9e5dc 100644 --- a/drivers/staging/lustre/lustre/include/obd.h +++ b/drivers/staging/lustre/lustre/include/obd.h @@ -935,12 +935,11 @@ struct md_ops { struct ptlrpc_request **); int (*setxattr)(struct obd_export *, const struct lu_fid *, - u64, const char *, const char *, int, int, int, __u32, - struct ptlrpc_request **); + u64, const char *, const void *, size_t, unsigned int, + u32, struct ptlrpc_request **); int (*getxattr)(struct obd_export *, const struct lu_fid *, - u64, const char *, const char *, int, int, int, - struct ptlrpc_request **); + u64, const char *, size_t, struct ptlrpc_request **); int (*init_ea_size)(struct obd_export *, u32, u32); diff --git a/drivers/staging/lustre/lustre/include/obd_class.h b/drivers/staging/lustre/lustre/include/obd_class.h index a76f016..0081578 100644 --- a/drivers/staging/lustre/lustre/include/obd_class.h +++ b/drivers/staging/lustre/lustre/include/obd_class.h @@ -1385,29 +1385,26 @@ static inline int md_merge_attr(struct obd_export *exp, } static inline int md_setxattr(struct obd_export *exp, const struct lu_fid *fid, - u64 valid, const char *name, - const char *input, int input_size, - int output_size, int flags, __u32 suppgid, + u64 obd_md_valid, const char *name, + const char *value, size_t value_size, + unsigned int xattr_flags, u32 suppgid, struct ptlrpc_request **request) { EXP_CHECK_MD_OP(exp, setxattr); EXP_MD_COUNTER_INCREMENT(exp, setxattr); - return MDP(exp->exp_obd, setxattr)(exp, fid, valid, name, input, - input_size, output_size, flags, + return MDP(exp->exp_obd, setxattr)(exp, fid, obd_md_valid, name, + value, value_size, xattr_flags, suppgid, request); } static inline int md_getxattr(struct obd_export *exp, const struct lu_fid *fid, - u64 valid, const char *name, - const char *input, int input_size, - int output_size, int flags, - struct ptlrpc_request **request) + u64 obd_md_valid, const char *name, + size_t buf_size, struct ptlrpc_request **req) { EXP_CHECK_MD_OP(exp, getxattr); EXP_MD_COUNTER_INCREMENT(exp, getxattr); - return MDP(exp->exp_obd, getxattr)(exp, fid, valid, name, input, - input_size, output_size, flags, - request); + return MDP(exp->exp_obd, getxattr)(exp, fid, obd_md_valid, name, + buf_size, req); } static inline int md_set_open_replay_data(struct obd_export *exp, diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c index 35f5bda..9197891 100644 --- a/drivers/staging/lustre/lustre/llite/file.c +++ b/drivers/staging/lustre/lustre/llite/file.c @@ -3093,7 +3093,7 @@ int ll_set_acl(struct inode *inode, struct posix_acl *acl, int type) rc = md_setxattr(sbi->ll_md_exp, ll_inode2fid(inode), value ? OBD_MD_FLXATTR : OBD_MD_FLXATTRRM, -name, value, value_size, 0, 0, 0, ); +name, value, value_size, 0, 0, ); ptlrpc_req_finished(req); out_value: @@ -3405,8 +3405,7 @@ static int ll_layout_fetch(struct inode *inode, struct ldlm_lock *lock) rc = ll_get_default_mdsize(sbi, ); if (rc == 0) rc = md_getxattr(sbi->ll_md_exp, ll_inode2fid(inode), -
[PATCH 15/22] staging: lustre: llite: cleanup posix acl xattr code
Having an extra ifdef makes the code harder to read. For the case of ll_xattr_get_common() we have a variable initialized at the start of the function but it is only used in XATTR_ACL_ACCESS_T code block. Lets move that variable to that location since its only used there and make the code look cleaner. Signed-off-by: James SimmonsIntel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183 Reviewed-on: https://review.whamcloud.com/27240 Reviewed-by: Dmitry Eremin Reviewed-by: Bob Glossman Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index 3ab7ae0..147ffcc 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -396,9 +396,6 @@ static int ll_xattr_get_common(const struct xattr_handler *handler, const char *name, void *buffer, size_t size) { struct ll_sb_info *sbi = ll_i2sbi(inode); -#ifdef CONFIG_FS_POSIX_ACL - struct ll_inode_info *lli = ll_i2info(inode); -#endif char *fullname; int rc; @@ -422,6 +419,7 @@ static int ll_xattr_get_common(const struct xattr_handler *handler, * chance that cached ACL is uptodate. */ if (handler->flags == XATTR_ACL_ACCESS_T) { + struct ll_inode_info *lli = ll_i2info(inode); struct posix_acl *acl; spin_lock(>lli_lock); -- 1.8.3.1
[PATCH 15/22] staging: lustre: llite: cleanup posix acl xattr code
Having an extra ifdef makes the code harder to read. For the case of ll_xattr_get_common() we have a variable initialized at the start of the function but it is only used in XATTR_ACL_ACCESS_T code block. Lets move that variable to that location since its only used there and make the code look cleaner. Signed-off-by: James Simmons Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9183 Reviewed-on: https://review.whamcloud.com/27240 Reviewed-by: Dmitry Eremin Reviewed-by: Bob Glossman Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- drivers/staging/lustre/lustre/llite/xattr.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/xattr.c b/drivers/staging/lustre/lustre/llite/xattr.c index 3ab7ae0..147ffcc 100644 --- a/drivers/staging/lustre/lustre/llite/xattr.c +++ b/drivers/staging/lustre/lustre/llite/xattr.c @@ -396,9 +396,6 @@ static int ll_xattr_get_common(const struct xattr_handler *handler, const char *name, void *buffer, size_t size) { struct ll_sb_info *sbi = ll_i2sbi(inode); -#ifdef CONFIG_FS_POSIX_ACL - struct ll_inode_info *lli = ll_i2info(inode); -#endif char *fullname; int rc; @@ -422,6 +419,7 @@ static int ll_xattr_get_common(const struct xattr_handler *handler, * chance that cached ACL is uptodate. */ if (handler->flags == XATTR_ACL_ACCESS_T) { + struct ll_inode_info *lli = ll_i2info(inode); struct posix_acl *acl; spin_lock(>lli_lock); -- 1.8.3.1
[PATCH 01/25] staging: lustre: libcfs: remove useless CPU partition code
From: Dmitry Eremin* remove scratch buffer and mutex which guard it. * remove global cpumask and spinlock which guard it. * remove cpt_version for checking CPUs state change during setup because of just disable CPUs state change during setup. * remove whole global struct cfs_cpt_data cpt_data. * remove few unused APIs. Signed-off-by: Dmitry Eremin Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703 Reviewed-on: https://review.whamcloud.com/23303 Reviewed-on: https://review.whamcloud.com/25048 Reviewed-by: James Simmons Reviewed-by: Doug Oucharek Reviewed-by: Andreas Dilger Reviewed-by: Olaf Weber Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- .../lustre/include/linux/libcfs/libcfs_cpu.h | 13 +-- .../lustre/include/linux/libcfs/linux/linux-cpu.h | 2 - drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c| 18 +--- .../staging/lustre/lnet/libcfs/linux/linux-cpu.c | 114 +++-- 4 files changed, 20 insertions(+), 127 deletions(-) diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h index 61bce77..1f2cd78 100644 --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h @@ -162,12 +162,12 @@ struct cfs_cpt_table { * return 1 if successfully set all CPUs, otherwise return 0 */ int cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, - int cpt, cpumask_t *mask); + int cpt, const cpumask_t *mask); /** * remove all cpus in \a mask from CPU partition \a cpt */ void cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, - int cpt, cpumask_t *mask); + int cpt, const cpumask_t *mask); /** * add all cpus in NUMA node \a node to CPU partition \a cpt * return 1 if successfully set all CPUs, otherwise return 0 @@ -190,20 +190,11 @@ int cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab, void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab, int cpt, nodemask_t *mask); /** - * unset all cpus for CPU partition \a cpt - */ -void cfs_cpt_clear(struct cfs_cpt_table *cptab, int cpt); -/** * convert partition id \a cpt to numa node id, if there are more than one * nodes in this partition, it might return a different node id each time. */ int cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt); -/** - * return number of HTs in the same core of \a cpu - */ -int cfs_cpu_ht_nsiblings(int cpu); - /* * allocate per-cpu-partition data, returned value is an array of pointers, * variable can be indexed by CPU ID. diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h b/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h index 6035376..e8bbbaa 100644 --- a/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h +++ b/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h @@ -58,8 +58,6 @@ struct cfs_cpu_partition { /** descriptor for CPU partitions */ struct cfs_cpt_table { - /* version, reserved for hotplug */ - unsigned intctb_version; /* spread rotor for NUMA allocator */ unsigned intctb_spread_rotor; /* # of CPU partitions */ diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c index 76291a3..705abf2 100644 --- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c +++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c @@ -129,14 +129,15 @@ struct cfs_cpt_table * EXPORT_SYMBOL(cfs_cpt_unset_cpu); int -cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask) +cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt, const cpumask_t *mask) { return 1; } EXPORT_SYMBOL(cfs_cpt_set_cpumask); void -cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask) +cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt, + const cpumask_t *mask) { } EXPORT_SYMBOL(cfs_cpt_unset_cpumask); @@ -167,12 +168,6 @@ struct cfs_cpt_table * } EXPORT_SYMBOL(cfs_cpt_unset_nodemask); -void -cfs_cpt_clear(struct cfs_cpt_table *cptab, int cpt) -{ -} -EXPORT_SYMBOL(cfs_cpt_clear); - int cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt) { @@ -181,13 +176,6 @@ struct cfs_cpt_table * EXPORT_SYMBOL(cfs_cpt_spread_node); int -cfs_cpu_ht_nsiblings(int cpu) -{ - return 1; -} -EXPORT_SYMBOL(cfs_cpu_ht_nsiblings); - -int cfs_cpt_current(struct cfs_cpt_table *cptab, int remap) { return 0; diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c index 388521e..134b239 100644 ---
[PATCH 01/25] staging: lustre: libcfs: remove useless CPU partition code
From: Dmitry Eremin * remove scratch buffer and mutex which guard it. * remove global cpumask and spinlock which guard it. * remove cpt_version for checking CPUs state change during setup because of just disable CPUs state change during setup. * remove whole global struct cfs_cpt_data cpt_data. * remove few unused APIs. Signed-off-by: Dmitry Eremin Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-8703 Reviewed-on: https://review.whamcloud.com/23303 Reviewed-on: https://review.whamcloud.com/25048 Reviewed-by: James Simmons Reviewed-by: Doug Oucharek Reviewed-by: Andreas Dilger Reviewed-by: Olaf Weber Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- .../lustre/include/linux/libcfs/libcfs_cpu.h | 13 +-- .../lustre/include/linux/libcfs/linux/linux-cpu.h | 2 - drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c| 18 +--- .../staging/lustre/lnet/libcfs/linux/linux-cpu.c | 114 +++-- 4 files changed, 20 insertions(+), 127 deletions(-) diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h index 61bce77..1f2cd78 100644 --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h @@ -162,12 +162,12 @@ struct cfs_cpt_table { * return 1 if successfully set all CPUs, otherwise return 0 */ int cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, - int cpt, cpumask_t *mask); + int cpt, const cpumask_t *mask); /** * remove all cpus in \a mask from CPU partition \a cpt */ void cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, - int cpt, cpumask_t *mask); + int cpt, const cpumask_t *mask); /** * add all cpus in NUMA node \a node to CPU partition \a cpt * return 1 if successfully set all CPUs, otherwise return 0 @@ -190,20 +190,11 @@ int cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab, void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab, int cpt, nodemask_t *mask); /** - * unset all cpus for CPU partition \a cpt - */ -void cfs_cpt_clear(struct cfs_cpt_table *cptab, int cpt); -/** * convert partition id \a cpt to numa node id, if there are more than one * nodes in this partition, it might return a different node id each time. */ int cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt); -/** - * return number of HTs in the same core of \a cpu - */ -int cfs_cpu_ht_nsiblings(int cpu); - /* * allocate per-cpu-partition data, returned value is an array of pointers, * variable can be indexed by CPU ID. diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h b/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h index 6035376..e8bbbaa 100644 --- a/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h +++ b/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h @@ -58,8 +58,6 @@ struct cfs_cpu_partition { /** descriptor for CPU partitions */ struct cfs_cpt_table { - /* version, reserved for hotplug */ - unsigned intctb_version; /* spread rotor for NUMA allocator */ unsigned intctb_spread_rotor; /* # of CPU partitions */ diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c index 76291a3..705abf2 100644 --- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c +++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c @@ -129,14 +129,15 @@ struct cfs_cpt_table * EXPORT_SYMBOL(cfs_cpt_unset_cpu); int -cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask) +cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt, const cpumask_t *mask) { return 1; } EXPORT_SYMBOL(cfs_cpt_set_cpumask); void -cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask) +cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt, + const cpumask_t *mask) { } EXPORT_SYMBOL(cfs_cpt_unset_cpumask); @@ -167,12 +168,6 @@ struct cfs_cpt_table * } EXPORT_SYMBOL(cfs_cpt_unset_nodemask); -void -cfs_cpt_clear(struct cfs_cpt_table *cptab, int cpt) -{ -} -EXPORT_SYMBOL(cfs_cpt_clear); - int cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt) { @@ -181,13 +176,6 @@ struct cfs_cpt_table * EXPORT_SYMBOL(cfs_cpt_spread_node); int -cfs_cpu_ht_nsiblings(int cpu) -{ - return 1; -} -EXPORT_SYMBOL(cfs_cpu_ht_nsiblings); - -int cfs_cpt_current(struct cfs_cpt_table *cptab, int remap) { return 0; diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c index 388521e..134b239 100644 --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c @@ -64,30 +64,6 @@ module_param(cpu_pattern, charp, 0444);
[PATCH 03/25] staging: lustre: libcfs: implement cfs_cpt_cpumask for UMP case
From: Amir ShehataThe function cfs_cpt_cpumask() exist for SMP systems but when CONFIG_SMP is disabled it only returns NULL. Fill in this missing function. Also properly initialize ctb_mask for the UMP case. Signed-off-by: Amir Shehata Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734 Reviewed-on: http://review.whamcloud.com/18916 Reviewed-by: Olaf Weber Reviewed-by: Doug Oucharek Signed-off-by: James Simmons --- drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h | 16 +--- drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 9 + 2 files changed, 14 insertions(+), 11 deletions(-) diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h index 1f2cd78..070f8fe 100644 --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h @@ -77,10 +77,6 @@ #ifdef CONFIG_SMP /** - * return cpumask of CPU partition \a cpt - */ -cpumask_var_t *cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt); -/** * print string information of cpt-table */ int cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len); @@ -89,19 +85,13 @@ struct cfs_cpt_table { /* # of CPU partitions */ int ctb_nparts; /* cpu mask */ - cpumask_t ctb_mask; + cpumask_var_t ctb_mask; /* node mask */ nodemask_t ctb_nodemask; /* version */ u64 ctb_version; }; -static inline cpumask_var_t * -cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt) -{ - return NULL; -} - static inline int cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len) { @@ -133,6 +123,10 @@ struct cfs_cpt_table { */ int cfs_cpt_online(struct cfs_cpt_table *cptab, int cpt); /** + * return cpumask of CPU partition \a cpt + */ +cpumask_var_t *cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt); +/** * return nodemask of CPU partition \a cpt */ nodemask_t *cfs_cpt_nodemask(struct cfs_cpt_table *cptab, int cpt); diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c index 705abf2..5ea294f 100644 --- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c +++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c @@ -54,6 +54,9 @@ struct cfs_cpt_table * cptab = kzalloc(sizeof(*cptab), GFP_NOFS); if (cptab) { cptab->ctb_version = CFS_CPU_VERSION_MAGIC; + if (!zalloc_cpumask_var(>ctb_mask, GFP_NOFS)) + return NULL; + cpumask_set_cpu(0, cptab->ctb_mask); node_set(0, cptab->ctb_nodemask); cptab->ctb_nparts = ncpt; } @@ -108,6 +111,12 @@ struct cfs_cpt_table * } EXPORT_SYMBOL(cfs_cpt_online); +cpumask_var_t *cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt) +{ + return >ctb_mask; +} +EXPORT_SYMBOL(cfs_cpt_cpumask); + nodemask_t * cfs_cpt_nodemask(struct cfs_cpt_table *cptab, int cpt) { -- 1.8.3.1
[PATCH 06/25] staging: lustre: libcfs: replace num_possible_cpus() with nr_cpu_ids
From: Amir ShehataMove from num_possible_cpus() to nr_cpu_ids. Signed-off-by: Amir Shehata Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734 Reviewed-on: http://review.whamcloud.com/18916 Reviewed-by: Olaf Weber Reviewed-by: Doug Oucharek Signed-off-by: James Simmons --- drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c index b2a88ef..741db69 100644 --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c @@ -105,14 +105,14 @@ struct cfs_cpt_table * !cptab->ctb_nodemask) goto failed; - cptab->ctb_cpu2cpt = kvmalloc_array(num_possible_cpus(), + cptab->ctb_cpu2cpt = kvmalloc_array(nr_cpu_ids, sizeof(cptab->ctb_cpu2cpt[0]), GFP_KERNEL); if (!cptab->ctb_cpu2cpt) goto failed; memset(cptab->ctb_cpu2cpt, -1, - num_possible_cpus() * sizeof(cptab->ctb_cpu2cpt[0])); + nr_cpu_ids * sizeof(cptab->ctb_cpu2cpt[0])); cptab->ctb_parts = kvmalloc_array(ncpt, sizeof(cptab->ctb_parts[0]), GFP_KERNEL); -- 1.8.3.1
[PATCH 08/25] staging: lustre: libcfs: add cpu distance handling
From: Amir ShehataAdd functionality to calculate the distance between two CPTs. Expose those distance in debugfs so people deploying a setup can debug what is being created for CPTs. Signed-off-by: Amir Shehata Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734 Reviewed-on: http://review.whamcloud.com/18916 Reviewed-by: Olaf Weber Reviewed-by: Doug Oucharek Signed-off-by: James Simmons --- .../lustre/include/linux/libcfs/libcfs_cpu.h | 8 +++ .../lustre/include/linux/libcfs/linux/linux-cpu.h | 4 ++ drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c| 21 .../staging/lustre/lnet/libcfs/linux/linux-cpu.c | 59 ++ 4 files changed, 92 insertions(+) diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h index 839ec02..c0922fc 100644 --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h @@ -110,6 +110,10 @@ struct cfs_cpt_table { */ struct cfs_cpt_table *cfs_cpt_table_alloc(unsigned int ncpt); /** + * print distance information of cpt-table + */ +int cfs_cpt_distance_print(struct cfs_cpt_table *cptab, char *buf, int len); +/** * return total number of CPU partitions in \a cptab */ int @@ -143,6 +147,10 @@ struct cfs_cpt_table { */ int cfs_cpt_of_node(struct cfs_cpt_table *cptab, int node); /** + * NUMA distance between \a cpt1 and \a cpt2 in \a cptab + */ +unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab, int cpt1, int cpt2); +/** * bind current thread on a CPU-partition \a cpt of \a cptab */ int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt); diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h b/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h index 1bed0ba..4ac1670 100644 --- a/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h +++ b/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h @@ -52,6 +52,8 @@ struct cfs_cpu_partition { cpumask_var_t cpt_cpumask; /* nodes mask for this partition */ nodemask_t *cpt_nodemask; + /* NUMA distance between CPTs */ + unsigned int*cpt_distance; /* spread rotor for NUMA allocator */ unsigned intcpt_spread_rotor; }; @@ -60,6 +62,8 @@ struct cfs_cpu_partition { struct cfs_cpt_table { /* spread rotor for NUMA allocator */ unsigned intctb_spread_rotor; + /* maximum NUMA distance between all nodes in table */ + unsigned intctb_distance; /* # of CPU partitions */ unsigned intctb_nparts; /* partitions tables */ diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c index e6d1512..7ac2796 100644 --- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c +++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c @@ -41,6 +41,8 @@ #define CFS_CPU_VERSION_MAGIC 0xbabecafe +#define CFS_CPT_DISTANCE 1 /* Arbitrary positive value */ + struct cfs_cpt_table * cfs_cpt_table_alloc(unsigned int ncpt) { @@ -90,6 +92,19 @@ struct cfs_cpt_table * EXPORT_SYMBOL(cfs_cpt_table_print); #endif /* CONFIG_SMP */ +int cfs_cpt_distance_print(struct cfs_cpt_table *cptab, char *buf, int len) +{ + int rc; + + rc = snprintf(buf, len, "0\t: 0:%d\n", CFS_CPT_DISTANCE); + len -= rc; + if (len <= 0) + return -EFBIG; + + return rc; +} +EXPORT_SYMBOL(cfs_cpt_distance_print); + int cfs_cpt_number(struct cfs_cpt_table *cptab) { @@ -124,6 +139,12 @@ cpumask_var_t *cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt) } EXPORT_SYMBOL(cfs_cpt_nodemask); +unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab, int cpt1, int cpt2) +{ + return CFS_CPT_DISTANCE; +} +EXPORT_SYMBOL(cfs_cpt_distance); + int cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu) { diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c index fd0c451..1e184b1 100644 --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c @@ -76,6 +76,7 @@ struct cfs_cpu_partition *part = >ctb_parts[i]; kfree(part->cpt_nodemask); + kfree(part->cpt_distance); free_cpumask_var(part->cpt_cpumask); } @@ -137,6 +138,12 @@ struct cfs_cpt_table * if (!zalloc_cpumask_var(>cpt_cpumask, GFP_NOFS) || !part->cpt_nodemask) goto failed; + + part->cpt_distance = kvmalloc_array(cptab->ctb_nparts, +
[PATCH 03/25] staging: lustre: libcfs: implement cfs_cpt_cpumask for UMP case
From: Amir Shehata The function cfs_cpt_cpumask() exist for SMP systems but when CONFIG_SMP is disabled it only returns NULL. Fill in this missing function. Also properly initialize ctb_mask for the UMP case. Signed-off-by: Amir Shehata Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734 Reviewed-on: http://review.whamcloud.com/18916 Reviewed-by: Olaf Weber Reviewed-by: Doug Oucharek Signed-off-by: James Simmons --- drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h | 16 +--- drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c | 9 + 2 files changed, 14 insertions(+), 11 deletions(-) diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h index 1f2cd78..070f8fe 100644 --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h @@ -77,10 +77,6 @@ #ifdef CONFIG_SMP /** - * return cpumask of CPU partition \a cpt - */ -cpumask_var_t *cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt); -/** * print string information of cpt-table */ int cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len); @@ -89,19 +85,13 @@ struct cfs_cpt_table { /* # of CPU partitions */ int ctb_nparts; /* cpu mask */ - cpumask_t ctb_mask; + cpumask_var_t ctb_mask; /* node mask */ nodemask_t ctb_nodemask; /* version */ u64 ctb_version; }; -static inline cpumask_var_t * -cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt) -{ - return NULL; -} - static inline int cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len) { @@ -133,6 +123,10 @@ struct cfs_cpt_table { */ int cfs_cpt_online(struct cfs_cpt_table *cptab, int cpt); /** + * return cpumask of CPU partition \a cpt + */ +cpumask_var_t *cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt); +/** * return nodemask of CPU partition \a cpt */ nodemask_t *cfs_cpt_nodemask(struct cfs_cpt_table *cptab, int cpt); diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c index 705abf2..5ea294f 100644 --- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c +++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c @@ -54,6 +54,9 @@ struct cfs_cpt_table * cptab = kzalloc(sizeof(*cptab), GFP_NOFS); if (cptab) { cptab->ctb_version = CFS_CPU_VERSION_MAGIC; + if (!zalloc_cpumask_var(>ctb_mask, GFP_NOFS)) + return NULL; + cpumask_set_cpu(0, cptab->ctb_mask); node_set(0, cptab->ctb_nodemask); cptab->ctb_nparts = ncpt; } @@ -108,6 +111,12 @@ struct cfs_cpt_table * } EXPORT_SYMBOL(cfs_cpt_online); +cpumask_var_t *cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt) +{ + return >ctb_mask; +} +EXPORT_SYMBOL(cfs_cpt_cpumask); + nodemask_t * cfs_cpt_nodemask(struct cfs_cpt_table *cptab, int cpt) { -- 1.8.3.1
[PATCH 06/25] staging: lustre: libcfs: replace num_possible_cpus() with nr_cpu_ids
From: Amir Shehata Move from num_possible_cpus() to nr_cpu_ids. Signed-off-by: Amir Shehata Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734 Reviewed-on: http://review.whamcloud.com/18916 Reviewed-by: Olaf Weber Reviewed-by: Doug Oucharek Signed-off-by: James Simmons --- drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c index b2a88ef..741db69 100644 --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c @@ -105,14 +105,14 @@ struct cfs_cpt_table * !cptab->ctb_nodemask) goto failed; - cptab->ctb_cpu2cpt = kvmalloc_array(num_possible_cpus(), + cptab->ctb_cpu2cpt = kvmalloc_array(nr_cpu_ids, sizeof(cptab->ctb_cpu2cpt[0]), GFP_KERNEL); if (!cptab->ctb_cpu2cpt) goto failed; memset(cptab->ctb_cpu2cpt, -1, - num_possible_cpus() * sizeof(cptab->ctb_cpu2cpt[0])); + nr_cpu_ids * sizeof(cptab->ctb_cpu2cpt[0])); cptab->ctb_parts = kvmalloc_array(ncpt, sizeof(cptab->ctb_parts[0]), GFP_KERNEL); -- 1.8.3.1
[PATCH 08/25] staging: lustre: libcfs: add cpu distance handling
From: Amir Shehata Add functionality to calculate the distance between two CPTs. Expose those distance in debugfs so people deploying a setup can debug what is being created for CPTs. Signed-off-by: Amir Shehata Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734 Reviewed-on: http://review.whamcloud.com/18916 Reviewed-by: Olaf Weber Reviewed-by: Doug Oucharek Signed-off-by: James Simmons --- .../lustre/include/linux/libcfs/libcfs_cpu.h | 8 +++ .../lustre/include/linux/libcfs/linux/linux-cpu.h | 4 ++ drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c| 21 .../staging/lustre/lnet/libcfs/linux/linux-cpu.c | 59 ++ 4 files changed, 92 insertions(+) diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h index 839ec02..c0922fc 100644 --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_cpu.h @@ -110,6 +110,10 @@ struct cfs_cpt_table { */ struct cfs_cpt_table *cfs_cpt_table_alloc(unsigned int ncpt); /** + * print distance information of cpt-table + */ +int cfs_cpt_distance_print(struct cfs_cpt_table *cptab, char *buf, int len); +/** * return total number of CPU partitions in \a cptab */ int @@ -143,6 +147,10 @@ struct cfs_cpt_table { */ int cfs_cpt_of_node(struct cfs_cpt_table *cptab, int node); /** + * NUMA distance between \a cpt1 and \a cpt2 in \a cptab + */ +unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab, int cpt1, int cpt2); +/** * bind current thread on a CPU-partition \a cpt of \a cptab */ int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt); diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h b/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h index 1bed0ba..4ac1670 100644 --- a/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h +++ b/drivers/staging/lustre/include/linux/libcfs/linux/linux-cpu.h @@ -52,6 +52,8 @@ struct cfs_cpu_partition { cpumask_var_t cpt_cpumask; /* nodes mask for this partition */ nodemask_t *cpt_nodemask; + /* NUMA distance between CPTs */ + unsigned int*cpt_distance; /* spread rotor for NUMA allocator */ unsigned intcpt_spread_rotor; }; @@ -60,6 +62,8 @@ struct cfs_cpu_partition { struct cfs_cpt_table { /* spread rotor for NUMA allocator */ unsigned intctb_spread_rotor; + /* maximum NUMA distance between all nodes in table */ + unsigned intctb_distance; /* # of CPU partitions */ unsigned intctb_nparts; /* partitions tables */ diff --git a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c index e6d1512..7ac2796 100644 --- a/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c +++ b/drivers/staging/lustre/lnet/libcfs/libcfs_cpu.c @@ -41,6 +41,8 @@ #define CFS_CPU_VERSION_MAGIC 0xbabecafe +#define CFS_CPT_DISTANCE 1 /* Arbitrary positive value */ + struct cfs_cpt_table * cfs_cpt_table_alloc(unsigned int ncpt) { @@ -90,6 +92,19 @@ struct cfs_cpt_table * EXPORT_SYMBOL(cfs_cpt_table_print); #endif /* CONFIG_SMP */ +int cfs_cpt_distance_print(struct cfs_cpt_table *cptab, char *buf, int len) +{ + int rc; + + rc = snprintf(buf, len, "0\t: 0:%d\n", CFS_CPT_DISTANCE); + len -= rc; + if (len <= 0) + return -EFBIG; + + return rc; +} +EXPORT_SYMBOL(cfs_cpt_distance_print); + int cfs_cpt_number(struct cfs_cpt_table *cptab) { @@ -124,6 +139,12 @@ cpumask_var_t *cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt) } EXPORT_SYMBOL(cfs_cpt_nodemask); +unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab, int cpt1, int cpt2) +{ + return CFS_CPT_DISTANCE; +} +EXPORT_SYMBOL(cfs_cpt_distance); + int cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu) { diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c index fd0c451..1e184b1 100644 --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c @@ -76,6 +76,7 @@ struct cfs_cpu_partition *part = >ctb_parts[i]; kfree(part->cpt_nodemask); + kfree(part->cpt_distance); free_cpumask_var(part->cpt_cpumask); } @@ -137,6 +138,12 @@ struct cfs_cpt_table * if (!zalloc_cpumask_var(>cpt_cpumask, GFP_NOFS) || !part->cpt_nodemask) goto failed; + + part->cpt_distance = kvmalloc_array(cptab->ctb_nparts, + sizeof(part->cpt_distance[0]), +
[PATCH 09/25] staging: lustre: libcfs: use distance in cpu and node handling
From: Amir ShehataTake into consideration the location of NUMA nodes and core when calling cfs_cpt_[un]set_cpu() and cfs_cpt_[un]set_node(). This enables functioning on platforms with 100s of cores and NUMA nodes. Signed-off-by: Amir Shehata Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734 Reviewed-on: http://review.whamcloud.com/18916 Reviewed-by: Olaf Weber Reviewed-by: Doug Oucharek Signed-off-by: James Simmons --- .../staging/lustre/lnet/libcfs/linux/linux-cpu.c | 192 +++-- 1 file changed, 143 insertions(+), 49 deletions(-) diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c index 1e184b1..bbf89b8 100644 --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c @@ -300,11 +300,134 @@ unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab, int cpt1, int cpt2) } EXPORT_SYMBOL(cfs_cpt_distance); +/* + * Calculate the maximum NUMA distance between all nodes in the + * from_mask and all nodes in the to_mask. + */ +static unsigned int cfs_cpt_distance_calculate(nodemask_t *from_mask, + nodemask_t *to_mask) +{ + unsigned int maximum; + unsigned int distance; + int from; + int to; + + maximum = 0; + for_each_node_mask(from, *from_mask) { + for_each_node_mask(to, *to_mask) { + distance = node_distance(from, to); + if (maximum < distance) + maximum = distance; + } + } + return maximum; +} + +static void cfs_cpt_add_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu) +{ + cptab->ctb_cpu2cpt[cpu] = cpt; + + cpumask_set_cpu(cpu, cptab->ctb_cpumask); + cpumask_set_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask); +} + +static void cfs_cpt_del_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu) +{ + cpumask_clear_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask); + cpumask_clear_cpu(cpu, cptab->ctb_cpumask); + + cptab->ctb_cpu2cpt[cpu] = -1; +} + +static void cfs_cpt_add_node(struct cfs_cpt_table *cptab, int cpt, int node) +{ + struct cfs_cpu_partition *part; + + if (!node_isset(node, *cptab->ctb_nodemask)) { + unsigned int dist; + + /* first time node is added to the CPT table */ + node_set(node, *cptab->ctb_nodemask); + cptab->ctb_node2cpt[node] = cpt; + + dist = cfs_cpt_distance_calculate(cptab->ctb_nodemask, + cptab->ctb_nodemask); + cptab->ctb_distance = dist; + } + + part = >ctb_parts[cpt]; + if (!node_isset(node, *part->cpt_nodemask)) { + int cpt2; + + /* first time node is added to this CPT */ + node_set(node, *part->cpt_nodemask); + for (cpt2 = 0; cpt2 < cptab->ctb_nparts; cpt2++) { + struct cfs_cpu_partition *part2; + unsigned int dist; + + part2 = >ctb_parts[cpt2]; + dist = cfs_cpt_distance_calculate(part->cpt_nodemask, + part2->cpt_nodemask); + part->cpt_distance[cpt2] = dist; + dist = cfs_cpt_distance_calculate(part2->cpt_nodemask, + part->cpt_nodemask); + part2->cpt_distance[cpt] = dist; + } + } +} + +static void cfs_cpt_del_node(struct cfs_cpt_table *cptab, int cpt, int node) +{ + struct cfs_cpu_partition *part = >ctb_parts[cpt]; + int cpu; + + for_each_cpu(cpu, part->cpt_cpumask) { + /* this CPT has other CPU belonging to this node? */ + if (cpu_to_node(cpu) == node) + break; + } + + if (cpu >= nr_cpu_ids && node_isset(node, *part->cpt_nodemask)) { + int cpt2; + + /* No more CPUs in the node for this CPT. */ + node_clear(node, *part->cpt_nodemask); + for (cpt2 = 0; cpt2 < cptab->ctb_nparts; cpt2++) { + struct cfs_cpu_partition *part2; + unsigned int dist; + + part2 = >ctb_parts[cpt2]; + if (node_isset(node, *part2->cpt_nodemask)) + cptab->ctb_node2cpt[node] = cpt2; + + dist = cfs_cpt_distance_calculate(part->cpt_nodemask, + part2->cpt_nodemask); + part->cpt_distance[cpt2] = dist; + dist = cfs_cpt_distance_calculate(part2->cpt_nodemask, +
[PATCH 09/25] staging: lustre: libcfs: use distance in cpu and node handling
From: Amir Shehata Take into consideration the location of NUMA nodes and core when calling cfs_cpt_[un]set_cpu() and cfs_cpt_[un]set_node(). This enables functioning on platforms with 100s of cores and NUMA nodes. Signed-off-by: Amir Shehata Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734 Reviewed-on: http://review.whamcloud.com/18916 Reviewed-by: Olaf Weber Reviewed-by: Doug Oucharek Signed-off-by: James Simmons --- .../staging/lustre/lnet/libcfs/linux/linux-cpu.c | 192 +++-- 1 file changed, 143 insertions(+), 49 deletions(-) diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c index 1e184b1..bbf89b8 100644 --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c @@ -300,11 +300,134 @@ unsigned int cfs_cpt_distance(struct cfs_cpt_table *cptab, int cpt1, int cpt2) } EXPORT_SYMBOL(cfs_cpt_distance); +/* + * Calculate the maximum NUMA distance between all nodes in the + * from_mask and all nodes in the to_mask. + */ +static unsigned int cfs_cpt_distance_calculate(nodemask_t *from_mask, + nodemask_t *to_mask) +{ + unsigned int maximum; + unsigned int distance; + int from; + int to; + + maximum = 0; + for_each_node_mask(from, *from_mask) { + for_each_node_mask(to, *to_mask) { + distance = node_distance(from, to); + if (maximum < distance) + maximum = distance; + } + } + return maximum; +} + +static void cfs_cpt_add_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu) +{ + cptab->ctb_cpu2cpt[cpu] = cpt; + + cpumask_set_cpu(cpu, cptab->ctb_cpumask); + cpumask_set_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask); +} + +static void cfs_cpt_del_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu) +{ + cpumask_clear_cpu(cpu, cptab->ctb_parts[cpt].cpt_cpumask); + cpumask_clear_cpu(cpu, cptab->ctb_cpumask); + + cptab->ctb_cpu2cpt[cpu] = -1; +} + +static void cfs_cpt_add_node(struct cfs_cpt_table *cptab, int cpt, int node) +{ + struct cfs_cpu_partition *part; + + if (!node_isset(node, *cptab->ctb_nodemask)) { + unsigned int dist; + + /* first time node is added to the CPT table */ + node_set(node, *cptab->ctb_nodemask); + cptab->ctb_node2cpt[node] = cpt; + + dist = cfs_cpt_distance_calculate(cptab->ctb_nodemask, + cptab->ctb_nodemask); + cptab->ctb_distance = dist; + } + + part = >ctb_parts[cpt]; + if (!node_isset(node, *part->cpt_nodemask)) { + int cpt2; + + /* first time node is added to this CPT */ + node_set(node, *part->cpt_nodemask); + for (cpt2 = 0; cpt2 < cptab->ctb_nparts; cpt2++) { + struct cfs_cpu_partition *part2; + unsigned int dist; + + part2 = >ctb_parts[cpt2]; + dist = cfs_cpt_distance_calculate(part->cpt_nodemask, + part2->cpt_nodemask); + part->cpt_distance[cpt2] = dist; + dist = cfs_cpt_distance_calculate(part2->cpt_nodemask, + part->cpt_nodemask); + part2->cpt_distance[cpt] = dist; + } + } +} + +static void cfs_cpt_del_node(struct cfs_cpt_table *cptab, int cpt, int node) +{ + struct cfs_cpu_partition *part = >ctb_parts[cpt]; + int cpu; + + for_each_cpu(cpu, part->cpt_cpumask) { + /* this CPT has other CPU belonging to this node? */ + if (cpu_to_node(cpu) == node) + break; + } + + if (cpu >= nr_cpu_ids && node_isset(node, *part->cpt_nodemask)) { + int cpt2; + + /* No more CPUs in the node for this CPT. */ + node_clear(node, *part->cpt_nodemask); + for (cpt2 = 0; cpt2 < cptab->ctb_nparts; cpt2++) { + struct cfs_cpu_partition *part2; + unsigned int dist; + + part2 = >ctb_parts[cpt2]; + if (node_isset(node, *part2->cpt_nodemask)) + cptab->ctb_node2cpt[node] = cpt2; + + dist = cfs_cpt_distance_calculate(part->cpt_nodemask, + part2->cpt_nodemask); + part->cpt_distance[cpt2] = dist; + dist = cfs_cpt_distance_calculate(part2->cpt_nodemask, + part->cpt_nodemask); +
[PATCH 11/25] staging: lustre: libcfs: invert error handling for cfs_cpt_table_print
From: Amir ShehataInstead of setting rc to -EFBIG for several cases in the loop lets initialize rc to -EFBIG and just break out of the loop in case of failure. Just set rc to zero once we successfully finish the loop. Signed-off-by: Amir Shehata Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734 Reviewed-on: http://review.whamcloud.com/18916 Reviewed-by: Olaf Weber Reviewed-by: Doug Oucharek Signed-off-by: James Simmons --- .../staging/lustre/lnet/libcfs/linux/linux-cpu.c | 22 ++ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c index bbf89b8..6d8dcd3 100644 --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c @@ -158,29 +158,26 @@ struct cfs_cpt_table * cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len) { char *tmp = buf; - int rc = 0; + int rc = -EFBIG; int i; int j; for (i = 0; i < cptab->ctb_nparts; i++) { - if (len > 0) { - rc = snprintf(tmp, len, "%d\t:", i); - len -= rc; - } + if (len <= 0) + goto out; + + rc = snprintf(tmp, len, "%d\t:", i); + len -= rc; - if (len <= 0) { - rc = -EFBIG; + if (len <= 0) goto out; - } tmp += rc; for_each_cpu(j, cptab->ctb_parts[i].cpt_cpumask) { - rc = snprintf(tmp, len, "%d ", j); + rc = snprintf(tmp, len, " %d", j); len -= rc; - if (len <= 0) { - rc = -EFBIG; + if (len <= 0) goto out; - } tmp += rc; } @@ -189,6 +186,7 @@ struct cfs_cpt_table * len--; } + rc = 0; out: if (rc < 0) return rc; -- 1.8.3.1
[PATCH 11/25] staging: lustre: libcfs: invert error handling for cfs_cpt_table_print
From: Amir Shehata Instead of setting rc to -EFBIG for several cases in the loop lets initialize rc to -EFBIG and just break out of the loop in case of failure. Just set rc to zero once we successfully finish the loop. Signed-off-by: Amir Shehata Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-7734 Reviewed-on: http://review.whamcloud.com/18916 Reviewed-by: Olaf Weber Reviewed-by: Doug Oucharek Signed-off-by: James Simmons --- .../staging/lustre/lnet/libcfs/linux/linux-cpu.c | 22 ++ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c index bbf89b8..6d8dcd3 100644 --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-cpu.c @@ -158,29 +158,26 @@ struct cfs_cpt_table * cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len) { char *tmp = buf; - int rc = 0; + int rc = -EFBIG; int i; int j; for (i = 0; i < cptab->ctb_nparts; i++) { - if (len > 0) { - rc = snprintf(tmp, len, "%d\t:", i); - len -= rc; - } + if (len <= 0) + goto out; + + rc = snprintf(tmp, len, "%d\t:", i); + len -= rc; - if (len <= 0) { - rc = -EFBIG; + if (len <= 0) goto out; - } tmp += rc; for_each_cpu(j, cptab->ctb_parts[i].cpt_cpumask) { - rc = snprintf(tmp, len, "%d ", j); + rc = snprintf(tmp, len, " %d", j); len -= rc; - if (len <= 0) { - rc = -EFBIG; + if (len <= 0) goto out; - } tmp += rc; } @@ -189,6 +186,7 @@ struct cfs_cpt_table * len--; } + rc = 0; out: if (rc < 0) return rc; -- 1.8.3.1