Re: (was Fair Trade O.S.) 0yZ Varanger Sys
Now also added 0yZ to its title, meaning 0 triune god, and 0 jesus on a cross. - I think that is what everybody here wants! Sent with ProtonMail Secure Email. ‐‐‐ Original Message ‐‐‐ On Wednesday 9. October 2019 kl. 09:52, Ywe Cærlyn wrote: > Ok, I think I have a fully and complete view of operating systems philosophy. > > Basically a good O.S. philosophy can be traced back to 1000 A.D. and the > Saxons forcing the bible, and "god" on people in Norway. Still Norway never > really did not become un-varangian. > Today contemplating a background on a good O.S. based on my research, I > understand that varangian culture was quite good, and building further on it, > ridding oneself of the christian trinitarian god, we can also have a good O.S. > > Therefore the system is now named Varanger Sys, with Cider as the original > drink of Tór. And the EDM culture of the 90s that Norway was largely > influental in, I have named Úpp Varanger EDM, that went all the way up to > Kygo (and Maren), that represents types already established in the 90s, which > I was part of. Very Norwegian, Scandinavian, probably European, and maybe > other places. And we fully support EU. > > This is the final basis of my operating system philosophy, and cultural > aspect. Small changes may come, if it improves things. > > Best Greetings, > Ywe Cærlyn > Lead & Philosophy > Varanger Sys > > https://www.youtube.com/channel/UCR3gmLVjHS5A702wo4bol_Q
Re: [PATCH bpf] libbpf: fix passing uninitialized bytes to setsockopt
On Sat, Oct 12, 2019 at 9:52 PM Ilya Maximets wrote: > > 'struct xdp_umem_reg' has 4 bytes of padding at the end that makes > valgrind complain about passing uninitialized stack memory to the > syscall: > > Syscall param socketcall.setsockopt() points to uninitialised byte(s) > at 0x4E7AB7E: setsockopt (in /usr/lib64/libc-2.29.so) > by 0x4BDE035: xsk_umem__create@@LIBBPF_0.0.4 (xsk.c:172) > Uninitialised value was created by a stack allocation > at 0x4BDDEBA: xsk_umem__create@@LIBBPF_0.0.4 (xsk.c:140) > > Padding bytes appeared after introducing of a new 'flags' field. > > Fixes: 10d30e301732 ("libbpf: add flags to umem config") > Signed-off-by: Ilya Maximets Something is not right with (e|g)mail. This is 3rd email I got with the same patch. First one (the one that was applied) was 3 days ago.
[PATCH] writeback: Fix a warning while "make xmldocs"
This patch fix following warning. ./fs/fs-writeback.c:918: warning: Excess function parameter 'nr_pages' description in 'cgroup_writeback_by_id' Signed-off-by: Masanari Iida --- fs/fs-writeback.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index e88421d9a48d..8461a6322039 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -905,7 +905,7 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi, * cgroup_writeback_by_id - initiate cgroup writeback from bdi and memcg IDs * @bdi_id: target bdi id * @memcg_id: target memcg css id - * @nr_pages: number of pages to write, 0 for best-effort dirty flushing + * @nr: number of pages to write, 0 for best-effort dirty flushing * @reason: reason why some writeback work initiated * @done: target wb_completion * -- 2.23.0.526.g70bf0b755af4
Re: [PATCH] mt76: mt76x2: disable pcie_aspm by default
Hi Lorenzo, I love your patch! Yet something to improve: [auto build test ERROR on wireless-drivers-next/master] [cannot apply to v5.4-rc2 next-20191011] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Lorenzo-Bianconi/mt76-mt76x2-disable-pcie_aspm-by-default/20191013-093134 base: https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next.git master config: x86_64-allyesconfig (attached as .config) compiler: gcc-7 (Debian 7.4.0-13) 7.4.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 If you fix the issue, kindly add following tag Reported-by: kbuild test robot All errors (new ones prefixed by >>): >> drivers/net/wireless/mediatek/mt76/mmio.c:7:10: fatal error: >> linux/pci-aspm.h: No such file or directory #include ^~ compilation terminated. vim +7 drivers/net/wireless/mediatek/mt76/mmio.c > 7 #include 8 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
[PATCH] ACPI Documentation: Minor Spelling Fix
Very minor spelling fix in ACPI documentation Signed-off-by: James Pack --- Documentation/firmware-guide/acpi/namespace.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/firmware-guide/acpi/namespace.rst b/Documentation/firmware-guide/acpi/namespace.rst index 835521baeb89..3eb763d6656d 100644 --- a/Documentation/firmware-guide/acpi/namespace.rst +++ b/Documentation/firmware-guide/acpi/namespace.rst @@ -261,7 +261,7 @@ Description Tables contain information used for the creation of the struct acpi_device objects represented by the given row (xSDT means DSDT or SSDT). -The forth column of the above table indicates the 'bus_id' generation +The fourth column of the above table indicates the 'bus_id' generation rule of the struct acpi_device object: _HID: -- 2.20.1
Re: [PATCH net-next v3] genetlink: do not parse attributes for families with zero maxattr
On Fri, 11 Oct 2019 09:40:09 +0200, Michal Kubecek wrote: > Commit c10e6cf85e7d ("net: genetlink: push attrbuf allocation and parsing > to a separate function") moved attribute buffer allocation and attribute > parsing from genl_family_rcv_msg_doit() into a separate function > genl_family_rcv_msg_attrs_parse() which, unlike the previous code, calls > __nlmsg_parse() even if family->maxattr is 0 (i.e. the family does its own > parsing). The parser error is ignored and does not propagate out of > genl_family_rcv_msg_attrs_parse() but an error message ("Unknown attribute > type") is set in extack and if further processing generates no error or > warning, it stays there and is interpreted as a warning by userspace. > > Dumpit requests are not affected as genl_family_rcv_msg_dumpit() bypasses > the call of genl_family_rcv_msg_attrs_parse() if family->maxattr is zero. > Move this logic inside genl_family_rcv_msg_attrs_parse() so that we don't > have to handle it in each caller. > > v3: put the check inside genl_family_rcv_msg_attrs_parse() > v2: adjust also argument of genl_family_rcv_msg_attrs_free() > > Fixes: c10e6cf85e7d ("net: genetlink: push attrbuf allocation and parsing to > a separate function") > Signed-off-by: Michal Kubecek Acked-by: Jakub Kicinski
Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
On Sat, Oct 12, 2019 at 6:14 PM Andy Lutomirski wrote: > .. > > > But maybe we can go further: let's separate authentication and > > authorization, as we do in other LSM hooks. Let's split my > > inode_init_security_anon into two hooks, inode_init_security_anon and > > inode_create_anon. We'd define the former to just initialize the file > > object's security information --- in the SELinux case, figuring out > > its class and SID --- and define the latter to answer the yes/no > > question of whether a particular anonymous inode creation should be > > allowed. Normally, anon_inode_getfile2() would just call both hooks. > > We'd add another anon_inode_getfd flag, ANON_INODE_SKIP_AUTHORIZATION > > or something, that would tell anon_inode_getfile2() to skip calling > > the authorization hook, effectively making the creation always > > succeed. We can then make the UFFD code pass > > ANON_INODE_SKIP_AUTHORIZATION when it's creating a file object in the > > fork child while creating UFFD_EVENT_FORK messages. > > That sounds like an improvement. Or maybe just teach SELinux that > this particular fd creation is actually making an anon_inode that is a > child of an existing anon inode and that the context should be copied > or whatever SELinux wants to do. Like this, maybe: > > static int resolve_userfault_fork(struct userfaultfd_ctx *ctx, > struct userfaultfd_ctx *new, > struct uffd_msg *msg) > { > int fd; > > Change this: > > fd = anon_inode_getfd("[userfaultfd]", _fops, new, > O_RDWR | (new->flags & > UFFD_SHARED_FCNTL_FLAGS)); > > to something like: > > fd = anon_inode_make_child_fd(..., ctx->inode, ...); > > where ctx->inode is the one context's inode. Yeah. I figured we could just add a special-purpose hook for this case. Having a special hook for this one case feels ugly though, and at copy_mm time, we don't have a PID for the new child yet --- I don't know whether LSMs would care about that. But maybe this is one of those "doctor, it hurts when I do this!" situations and this child process difficulty is just a hint that some other design might work better. > Now that you've pointed this mechanism out, it is utterly and > completely broken and should be removed from the kernel outright or at > least severely restricted. A .read implementation MUST NOT ACT ON THE > CALLING TASK. Ever. Just imagine the effect of passing a userfaultfd > as stdin to a setuid program. > > So I think the right solution might be to attempt to *remove* > UFFD_EVENT_FORK. Maybe the solution is to say that, unless the > creator of a userfaultfd() has global CAP_SYS_ADMIN, then it cannot > use UFFD_FEATURE_EVENT_FORK) and print a warning (once) when > UFFD_FEATURE_EVENT_FORK is allowed. And, after some suitable > deprecation period, just remove it. If it's genuinely useful, it > needs an entirely new API based on ioctl() or a syscall. Or even > recvmsg() :) IMHO, userfaultfd should have been a datagram socket from the start. As you point out, it's a good fit for the UFFD protocol, which involves FD passing and a fixed message size. > And UFFD_SECURE should just become automatic, since you don't have a > problem any more. :-p Agreed. I'll wait to hear what everyone else has to say.
Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
[adding more people because this is going to be an ABI break, sigh] On Sat, Oct 12, 2019 at 5:52 PM Daniel Colascione wrote: > > On Sat, Oct 12, 2019 at 4:10 PM Andy Lutomirski wrote: > > > > On Sat, Oct 12, 2019 at 12:16 PM Daniel Colascione > > wrote: > > > > > > The new secure flag makes userfaultfd use a new "secure" anonymous > > > file object instead of the default one, letting security modules > > > supervise userfaultfd use. > > > > > > Requiring that users pass a new flag lets us avoid changing the > > > semantics for existing callers. > > > > Is there any good reason not to make this be the default? > > > > > > The only downside I can see is that it would increase the memory usage > > of userfaultfd(), but that doesn't seem like such a big deal. A > > lighter-weight alternative would be to have a single inode shared by > > all userfaultfd instances, which would require a somewhat different > > internal anon_inode API. > > I'd also prefer to just make SELinux use mandatory, but there's a > nasty interaction with UFFD_EVENT_FORK. Adding a new UFFD_SECURE mode > which blocks UFFD_EVENT_FORK sidesteps this problem. Maybe you know a > better way to deal with it. ... > But maybe we can go further: let's separate authentication and > authorization, as we do in other LSM hooks. Let's split my > inode_init_security_anon into two hooks, inode_init_security_anon and > inode_create_anon. We'd define the former to just initialize the file > object's security information --- in the SELinux case, figuring out > its class and SID --- and define the latter to answer the yes/no > question of whether a particular anonymous inode creation should be > allowed. Normally, anon_inode_getfile2() would just call both hooks. > We'd add another anon_inode_getfd flag, ANON_INODE_SKIP_AUTHORIZATION > or something, that would tell anon_inode_getfile2() to skip calling > the authorization hook, effectively making the creation always > succeed. We can then make the UFFD code pass > ANON_INODE_SKIP_AUTHORIZATION when it's creating a file object in the > fork child while creating UFFD_EVENT_FORK messages. That sounds like an improvement. Or maybe just teach SELinux that this particular fd creation is actually making an anon_inode that is a child of an existing anon inode and that the context should be copied or whatever SELinux wants to do. Like this, maybe: static int resolve_userfault_fork(struct userfaultfd_ctx *ctx, struct userfaultfd_ctx *new, struct uffd_msg *msg) { int fd; Change this: fd = anon_inode_getfd("[userfaultfd]", _fops, new, O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS)); to something like: fd = anon_inode_make_child_fd(..., ctx->inode, ...); where ctx->inode is the one context's inode. *** HOWEVER *** !!! Now that you've pointed this mechanism out, it is utterly and completely broken and should be removed from the kernel outright or at least severely restricted. A .read implementation MUST NOT ACT ON THE CALLING TASK. Ever. Just imagine the effect of passing a userfaultfd as stdin to a setuid program. So I think the right solution might be to attempt to *remove* UFFD_EVENT_FORK. Maybe the solution is to say that, unless the creator of a userfaultfd() has global CAP_SYS_ADMIN, then it cannot use UFFD_FEATURE_EVENT_FORK) and print a warning (once) when UFFD_FEATURE_EVENT_FORK is allowed. And, after some suitable deprecation period, just remove it. If it's genuinely useful, it needs an entirely new API based on ioctl() or a syscall. Or even recvmsg() :) And UFFD_SECURE should just become automatic, since you don't have a problem any more. :-p --Andy
Re: [PATCH 0/2] media: meson: vdec: Add compliant H264 support
Le lundi 07 octobre 2019 à 16:59 +0200, Maxime Jourdan a écrit : > Hello, > > This patch series aims to bring H.264 support as well as compliance update > to the amlogic stateful video decoder driver. > > There is 1 issue that remains currently: > > - The following codepath had to be commented out from v4l2-compliance as > it led to stalling: > > if (node->codec_mask & STATEFUL_DECODER) { > struct v4l2_decoder_cmd cmd; > buffer buf_cap(m2m_q); > > memset(, 0, sizeof(cmd)); > cmd.cmd = V4L2_DEC_CMD_STOP; > > /* No buffers are queued, call STREAMON, then STOP */ > fail_on_test(node->streamon(q.g_type())); > fail_on_test(node->streamon(m2m_q.g_type())); > fail_on_test(doioctl(node, VIDIOC_DECODER_CMD, )); > > fail_on_test(buf_cap.querybuf(node, 0)); > fail_on_test(buf_cap.qbuf(node)); > fail_on_test(buf_cap.dqbuf(node)); > fail_on_test(!(buf_cap.g_flags() & V4L2_BUF_FLAG_LAST)); > for (unsigned p = 0; p < buf_cap.g_num_planes(); p++) > fail_on_test(buf_cap.g_bytesused(p)); > fail_on_test(node->streamoff(q.g_type())); > fail_on_test(node->streamoff(m2m_q.g_type())); > > /* Call STREAMON, queue one CAPTURE buffer, then STOP */ > fail_on_test(node->streamon(q.g_type())); > fail_on_test(node->streamon(m2m_q.g_type())); > fail_on_test(buf_cap.querybuf(node, 0)); > fail_on_test(buf_cap.qbuf(node)); > fail_on_test(doioctl(node, VIDIOC_DECODER_CMD, )); > > fail_on_test(buf_cap.dqbuf(node)); > fail_on_test(!(buf_cap.g_flags() & V4L2_BUF_FLAG_LAST)); > for (unsigned p = 0; p < buf_cap.g_num_planes(); p++) > fail_on_test(buf_cap.g_bytesused(p)); > fail_on_test(node->streamoff(q.g_type())); > fail_on_test(node->streamoff(m2m_q.g_type())); > } > > The reason for this is because the driver has a limitation where all > capturebuffers must be queued to the driver before STREAMON is effective. > The firmware needs to know in advance what all the buffers are before > starting to decode. > This limitation is enforced via q->min_buffers_needed. > As such, in this compliance codepath, STREAMON is never actually called > driver-side and there is a stall on fail_on_test(buf_cap.dqbuf(node)); > > > One last detail: V4L2_FMT_FLAG_DYN_RESOLUTION is currently not recognized > by v4l2-compliance, so it was left out for the test. However, it is > present in the patch series. > > The second patch has 3 "Alignment should match open parenthesis" lines > where I preferred to keep them that way. > > Thanks Stanimir for sharing your HDR file creation tools, this was very > helpful :). I tried to test this with a pending branch of GStreamer supporting dynamic resolution changes. The even driver mechanism does not seem to work with this driver. I've grepped the code, and don't see any places were the event would be emitted. Then I grepped, and it seems the driver accept source_change subscription but does not set V4L2_FMT_FLAG_DYN_RESOLUTION. I believe these two things are bit redundant and confusing, I'll fix the proposed patch never the less, and see if that makes it work. > > Maxime > > # v4l2-compliance --stream-from-hdr test-25fps.h264.hdr -s250 > v4l2-compliance SHA: a162244d47d4bb01d0692da879dce5a070f118e7, 64 bits > > Compliance test for meson-vdec device /dev/video0: > > Driver Info: > Driver name : meson-vdec > Card type: Amlogic Video Decoder > Bus info : platform:meson-vdec > Driver version : 5.4.0 > Capabilities : 0x84204000 > Video Memory-to-Memory Multiplanar > Streaming > Extended Pix Format > Device Capabilities > Device Caps : 0x04204000 > Video Memory-to-Memory Multiplanar > Streaming > Extended Pix Format > Detected Stateful Decoder > > Required ioctls: > test VIDIOC_QUERYCAP: OK > > Allow for multiple opens: > test second /dev/video0 open: OK > test VIDIOC_QUERYCAP: OK > test VIDIOC_G/S_PRIORITY: OK > test for unlimited opens: OK > > Debug ioctls: > test VIDIOC_DBG_G/S_REGISTER: OK (Not Supported) > test VIDIOC_LOG_STATUS: OK (Not Supported) > > Input ioctls: > test VIDIOC_G/S_TUNER/ENUM_FREQ_BANDS: OK (Not Supported) > test VIDIOC_G/S_FREQUENCY: OK (Not Supported) > test VIDIOC_S_HW_FREQ_SEEK: OK (Not Supported) > test VIDIOC_ENUMAUDIO: OK (Not Supported) > test VIDIOC_G/S/ENUMINPUT: OK (Not Supported) > test VIDIOC_G/S_AUDIO: OK (Not Supported) > Inputs: 0 Audio Inputs: 0 Tuners: 0 > > Output ioctls: > test VIDIOC_G/S_MODULATOR: OK (Not Supported) > test VIDIOC_G/S_FREQUENCY: OK (Not Supported) > test VIDIOC_ENUMAUDOUT: OK (Not Supported) > test VIDIOC_G/S/ENUMOUTPUT: OK (Not Supported) > test VIDIOC_G/S_AUDOUT: OK (Not Supported) >
Re: [PATCH 1/2] net: fec_main: Use platform_get_irq_byname_optional() to avoid error message
On Fri, 11 Oct 2019 12:55:20 +0300, Vladimir Oltean wrote: > > > Unfortunately the networking subsystem sees around a 100 patches > > > submitted each day, it'd be very hard to keep track of patches which have > > > external dependencies and when to merge them. That's why we need the > > > submitters to do this work for us and resubmit when the patch can be > > > applied cleanly. > > > > OK, I will resend this patch series once the necessary patch lands > > on the network tree. > > What has not been mentioned is that you can't create future > dependencies for patches which have a Fixes: tag. > > git describe --tags 7723f4c5ecdb # driver core: platform: Add an error > message to platform_get_irq*() > v5.3-rc1-13-g7723f4c5ecdb > > git describe --tags f1da567f1dc # driver core: platform: Add > platform_get_irq_byname_optional() > v5.4-rc1-46-gf1da567f1dc1 Ack, you raise some good points. AFAIU tho, in this case broken patch, the dependency, and the fix are all targeting 5.4, so there will be no real backporting hassle, while the presence of a Fixes tag makes it clear where the regression was introduced.
Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
On Sat, Oct 12, 2019 at 4:10 PM Andy Lutomirski wrote: > > On Sat, Oct 12, 2019 at 12:16 PM Daniel Colascione wrote: > > > > The new secure flag makes userfaultfd use a new "secure" anonymous > > file object instead of the default one, letting security modules > > supervise userfaultfd use. > > > > Requiring that users pass a new flag lets us avoid changing the > > semantics for existing callers. > > Is there any good reason not to make this be the default? > > > The only downside I can see is that it would increase the memory usage > of userfaultfd(), but that doesn't seem like such a big deal. A > lighter-weight alternative would be to have a single inode shared by > all userfaultfd instances, which would require a somewhat different > internal anon_inode API. I'd also prefer to just make SELinux use mandatory, but there's a nasty interaction with UFFD_EVENT_FORK. Adding a new UFFD_SECURE mode which blocks UFFD_EVENT_FORK sidesteps this problem. Maybe you know a better way to deal with it. Right now, when a process with a UFFD-managed VMA using UFFD_EVENT_FORK forks, we make a new userfaultfd_ctx out of thin air and enqueue it on the message queue for the parent process. When we dequeue that context, we get to resolve_userfault_fork, which makes up a new UFFD file object out of thin air in the context of the reading process. Following normal SELinux rules, the SID attached to that new file object would be the task SID of the process *reading* the fork event, not the SID of the new fork child. That seems wrong, because the label we give to the UFFD should correspond to the label of the process that UFFD controls. To try to solve this problem, we can move the file object creation to the fork child and enqueue the file object itself instead of just the userfaultfd_ctx, treating the dequeue as a file-descriptor-receive operation just like a recvmsg of an AF_UNIX socket with SCM_RIGHTS. (This approach seems more elegant anyway, since it reflects what's actually going on.) The trouble the early-file-object-creation approach is that the fork child may not be allowed to create UFFD file objects on its own and an LSM can't tell the difference between UFFD_EVENT_FORK handling creating the file object and the fork child just calling userfaultfd(), meaning an LSM could veto the creation of the file object for the fork event. We can't just create a non-ANON_INODE_SECURE file object instead: that would defeat the whole purpose of supervising UFFD using SELinux. But maybe we can go further: let's separate authentication and authorization, as we do in other LSM hooks. Let's split my inode_init_security_anon into two hooks, inode_init_security_anon and inode_create_anon. We'd define the former to just initialize the file object's security information --- in the SELinux case, figuring out its class and SID --- and define the latter to answer the yes/no question of whether a particular anonymous inode creation should be allowed. Normally, anon_inode_getfile2() would just call both hooks. We'd add another anon_inode_getfd flag, ANON_INODE_SKIP_AUTHORIZATION or something, that would tell anon_inode_getfile2() to skip calling the authorization hook, effectively making the creation always succeed. We can then make the UFFD code pass ANON_INODE_SKIP_AUTHORIZATION when it's creating a file object in the fork child while creating UFFD_EVENT_FORK messages. Granted, UFFD fork processing doesn't actually occur in the fork child, but in copy_mm, in the parent --- but the right thing should happen anyway, right? I'm open to suggestions. In the meantime, I figured we'd just define a UFFD_SECURE and make it incompatible with UFFD_EVENT_FORK. > In any event, I don't think that "make me visible to SELinux" should > be a choice that user code makes. Right. The new unprivileged_userfaultfd setting is ugly, but it at least removes the ability of unprivileged users to opt out of SELinux supervision.
Re: [PATCH 4/7] Teach SELinux about a new userfaultfd class
On Sat, Oct 12, 2019 at 5:12 PM Daniel Colascione wrote: > > On Sat, Oct 12, 2019 at 4:09 PM Andy Lutomirski wrote: > > > > On Sat, Oct 12, 2019 at 12:16 PM Daniel Colascione > > wrote: > > > > > > Use the secure anonymous inode LSM hook we just added to let SELinux > > > policy place restrictions on userfaultfd use. The create operation > > > applies to processes creating new instances of these file objects; > > > transfer between processes is covered by restrictions on read, write, > > > and ioctl access already checked inside selinux_file_receive. > > > > This is great, and I suspect we'll want it for things like SGX, too. > > But the current design seems like it will make it essentially > > impossible for SELinux to reference an anon_inode class whose > > file_operations are in a module, and moving file_operations out of a > > module would be nasty. > > > > Could this instead be keyed off a new struct anon_inode_class, an > > enum, or even just a string? > > The new LSM hook already receives the string that callers pass to the > anon_inode APIs; modules can look at that instead of the fops if they > want. The reason to pass both the name and the fops through the hook > is to allow LSMs to match using fops comparison (which seems less > prone to breakage) when possible and rely on string matching when it > isn't. I suppose that whoever makes the first module that wants to use this mechanism can have the fun task of reworking it. There's nothing user-visible here that would make it hard to change in the future.
Re: [PATCH 1/7 v2] tracefs: Revert ccbd54ff54e8 ("tracefs: Restrict tracefs when the kernel is locked down")
On Sat, 12 Oct 2019 20:35:02 -0400 Steven Rostedt wrote: > On Sat, 12 Oct 2019 15:56:15 -0700 > Linus Torvalds wrote: > > > On Fri, Oct 11, 2019 at 5:59 PM Steven Rostedt wrote: > > > > > > > > > > > I bisected this down to the addition of the proxy_ops into tracefs for > > > lockdown. It appears that the allocation of the proxy_ops and then freeing > > > it in the destroy_inode callback, is causing havoc with the memory system. > > > Reading the documentation about destroy_inode and talking with Linus about > > > this, this is buggy and wrong. > > > > Can you still add the explanation about the inode memory leak to this > > message? > > > > Right now it just says "it's buggy and wrong". True. But doesn't > > explain _why_ it is buggy and wrong. > > > > Sure. The patches just finished my testing (along with other fixes that > I need to send you). I have to make a few other updates in the change > log though, so I'll be rebasing them (but not touching the code), to > clean up the change logs. > I updated this change log to state: "I bisected this down to the addition of the proxy_ops into tracefs for lockdown. It appears that the allocation of the proxy_ops and then freeing it in the destroy_inode callback, is causing havoc with the memory system. Reading the documentation about destroy_inode and talking with Linus about this, this is buggy and wrong. When defining the destroy_inode() method, it is expected that the destroy_inode() will also free the inode, and not just the extra allocations done in the creation of the inode. The faulty commit causes a memory leak of the inode data structure when they are deleted." -- Steve
[PATCH] xhci: Don't use soft retry if slot id > 0
According to the xhci specification(chapter 4.6.8.1) soft retry shouldn't be used if the slot id is higher than 0. Currently some usb devices break on some systems because soft retry is being used when there is a transaction error, without checking the slot id. Fixes: f8f80be501aa ("xhci: Use soft retry to recover faster from transaction errors") Signed-off-by: Bernhard Gebetsberger --- drivers/usb/host/xhci-ring.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 85ceb43e3405..5fa06189068d 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -2270,7 +2270,7 @@ static int process_bulk_intr_td(struct xhci_hcd *xhci, struct xhci_td *td, break; case COMP_USB_TRANSACTION_ERROR: if ((ep_ring->err_count++ > MAX_SOFT_RETRY) || - le32_to_cpu(slot_ctx->tt_info) & TT_SLOT) + le32_to_cpu(slot_ctx->tt_info) & TT_SLOT || slot_id > 0) break; *status = 0; xhci_cleanup_halted_endpoint(xhci, slot_id, ep_index, -- 2.23.0
Re: [PATCH 1/7 v2] tracefs: Revert ccbd54ff54e8 ("tracefs: Restrict tracefs when the kernel is locked down")
On Sat, 12 Oct 2019 15:56:15 -0700 Linus Torvalds wrote: > On Fri, Oct 11, 2019 at 5:59 PM Steven Rostedt wrote: > > > > > > I bisected this down to the addition of the proxy_ops into tracefs for > > lockdown. It appears that the allocation of the proxy_ops and then freeing > > it in the destroy_inode callback, is causing havoc with the memory system. > > Reading the documentation about destroy_inode and talking with Linus about > > this, this is buggy and wrong. > > Can you still add the explanation about the inode memory leak to this message? > > Right now it just says "it's buggy and wrong". True. But doesn't > explain _why_ it is buggy and wrong. > Sure. The patches just finished my testing (along with other fixes that I need to send you). I have to make a few other updates in the change log though, so I'll be rebasing them (but not touching the code), to clean up the change logs. -- Steve
[PATCH v6 9/9] hugetlb_cgroup: Add hugetlb_cgroup reservation docs
Add docs for how to use hugetlb_cgroup reservations, and their behavior. Signed-off-by: Mina Almasry Acked-by: Hillf Danton --- Changes in v6: - Updated docs to reflect the new design based on a new counter that tracks both reservations and faults. --- .../admin-guide/cgroup-v1/hugetlb.rst | 64 +++ 1 file changed, 53 insertions(+), 11 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v1/hugetlb.rst b/Documentation/admin-guide/cgroup-v1/hugetlb.rst index a3902aa253a96..efb94e4db9d5a 100644 --- a/Documentation/admin-guide/cgroup-v1/hugetlb.rst +++ b/Documentation/admin-guide/cgroup-v1/hugetlb.rst @@ -2,13 +2,6 @@ HugeTLB Controller == -The HugeTLB controller allows to limit the HugeTLB usage per control group and -enforces the controller limit during page fault. Since HugeTLB doesn't -support page reclaim, enforcing the limit at page fault time implies that, -the application will get SIGBUS signal if it tries to access HugeTLB pages -beyond its limit. This requires the application to know beforehand how much -HugeTLB pages it would require for its use. - HugeTLB controller can be created by first mounting the cgroup filesystem. # mount -t cgroup -o hugetlb none /sys/fs/cgroup @@ -28,10 +21,14 @@ process (bash) into it. Brief summary of control files:: - hugetlb..limit_in_bytes # set/show limit of "hugepagesize" hugetlb usage - hugetlb..max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded - hugetlb..usage_in_bytes # show current usage for "hugepagesize" hugetlb - hugetlb..failcnt # show the number of allocation failure due to HugeTLB limit + hugetlb..reservation_limit_in_bytes # set/show limit of "hugepagesize" hugetlb reservations + hugetlb..reservation_max_usage_in_bytes # show max "hugepagesize" hugetlb reservations and no-reserve faults. + hugetlb..reservation_usage_in_bytes # show current reservations and no-reserve faults for "hugepagesize" hugetlb + hugetlb..reservation_failcnt# show the number of allocation failure due to HugeTLB reservation limit + hugetlb..limit_in_bytes # set/show limit of "hugepagesize" hugetlb faults + hugetlb..max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded + hugetlb..usage_in_bytes # show current usage for "hugepagesize" hugetlb + hugetlb..failcnt# show the number of allocation failure due to HugeTLB usage limit For a system supporting three hugepage sizes (64k, 32M and 1G), the control files include:: @@ -40,11 +37,56 @@ files include:: hugetlb.1GB.max_usage_in_bytes hugetlb.1GB.usage_in_bytes hugetlb.1GB.failcnt + hugetlb.1GB.reservation_limit_in_bytes + hugetlb.1GB.reservation_max_usage_in_bytes + hugetlb.1GB.reservation_usage_in_bytes + hugetlb.1GB.reservation_failcnt hugetlb.64KB.limit_in_bytes hugetlb.64KB.max_usage_in_bytes hugetlb.64KB.usage_in_bytes hugetlb.64KB.failcnt + hugetlb.64KB.reservation_limit_in_bytes + hugetlb.64KB.reservation_max_usage_in_bytes + hugetlb.64KB.reservation_usage_in_bytes + hugetlb.64KB.reservation_failcnt hugetlb.32MB.limit_in_bytes hugetlb.32MB.max_usage_in_bytes hugetlb.32MB.usage_in_bytes hugetlb.32MB.failcnt + hugetlb.32MB.reservation_limit_in_bytes + hugetlb.32MB.reservation_max_usage_in_bytes + hugetlb.32MB.reservation_usage_in_bytes + hugetlb.32MB.reservation_failcnt + + +1. Reservation limits + +The HugeTLB controller allows to limit the HugeTLB reservations per control +group and enforces the controller limit at reservation time and at the fault of +hugetlb memory for which no reservation exists. Reservation limits +are superior to Page fault limits (see section 2), since Reservation limits are +enforced at reservation time (on mmap or shget), and never causes the +application to get SIGBUS signal if the memory was reserved before hand. For +MAP_NORESERVE allocations, the reservation limit behaves the same as the fault +limit, enforcing memory usage at fault time and causing the application to +receive a SIGBUS if it's crossing its limit. + +2. Page fault limits + +The HugeTLB controller allows to limit the HugeTLB usage (page fault) per +control group and enforces the controller limit during page fault. Since HugeTLB +doesn't support page reclaim, enforcing the limit at page fault time implies +that, the application will get SIGBUS signal if it tries to access HugeTLB +pages beyond its limit. This requires the application to know beforehand how +much HugeTLB pages it would require for its use. + + +3. Caveats with shared memory + +For shared hugetlb memory, both hugetlb reservation and page faults are charged +to the first task that causes the memory to be reserved or faulted, and all +subsequent uses of this reserved or faulted memory is done without charging. + +Shared hugetlb memory is only uncharged when it is unreserved or deallocated. +This is usually
[PATCH] net: core: skbuff: skb_checksum_setup() drop err
Return directly from all switch cases, no point in storing in err. Signed-off-by: Vito Caputo --- net/core/skbuff.c | 15 +++ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index f5f904f46893..c59b68a413b5 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4888,23 +4888,14 @@ static int skb_checksum_setup_ipv6(struct sk_buff *skb, bool recalculate) */ int skb_checksum_setup(struct sk_buff *skb, bool recalculate) { - int err; - switch (skb->protocol) { case htons(ETH_P_IP): - err = skb_checksum_setup_ipv4(skb, recalculate); - break; - + return skb_checksum_setup_ipv4(skb, recalculate); case htons(ETH_P_IPV6): - err = skb_checksum_setup_ipv6(skb, recalculate); - break; - + return skb_checksum_setup_ipv6(skb, recalculate); default: - err = -EPROTO; - break; + return -EPROTO; } - - return err; } EXPORT_SYMBOL(skb_checksum_setup); -- 2.11.0
[PATCH v6 8/9] hugetlb_cgroup: Add hugetlb_cgroup reservation tests
The tests use both shared and private mapped hugetlb memory, and monitors the hugetlb usage counter as well as the hugetlb reservation counter. They test different configurations such as hugetlb memory usage via hugetlbfs, or MAP_HUGETLB, or shmget/shmat, and with and without MAP_POPULATE. Signed-off-by: Mina Almasry --- Changes in v6: - Updates tests for cgroups-v2 and NORESERVE allocations. --- tools/testing/selftests/vm/.gitignore | 1 + tools/testing/selftests/vm/Makefile | 1 + .../selftests/vm/charge_reserved_hugetlb.sh | 527 ++ .../selftests/vm/write_hugetlb_memory.sh | 23 + .../testing/selftests/vm/write_to_hugetlbfs.c | 261 + 5 files changed, 813 insertions(+) create mode 100755 tools/testing/selftests/vm/charge_reserved_hugetlb.sh create mode 100644 tools/testing/selftests/vm/write_hugetlb_memory.sh create mode 100644 tools/testing/selftests/vm/write_to_hugetlbfs.c diff --git a/tools/testing/selftests/vm/.gitignore b/tools/testing/selftests/vm/.gitignore index 31b3c98b6d34d..d3bed9407773c 100644 --- a/tools/testing/selftests/vm/.gitignore +++ b/tools/testing/selftests/vm/.gitignore @@ -14,3 +14,4 @@ virtual_address_range gup_benchmark va_128TBswitch map_fixed_noreplace +write_to_hugetlbfs diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 9534dc2bc9295..31c2cc5cf30b5 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -18,6 +18,7 @@ TEST_GEN_FILES += transhuge-stress TEST_GEN_FILES += userfaultfd TEST_GEN_FILES += va_128TBswitch TEST_GEN_FILES += virtual_address_range +TEST_GEN_FILES += write_to_hugetlbfs TEST_PROGS := run_vmtests diff --git a/tools/testing/selftests/vm/charge_reserved_hugetlb.sh b/tools/testing/selftests/vm/charge_reserved_hugetlb.sh new file mode 100755 index 0..278dd6475cd0f --- /dev/null +++ b/tools/testing/selftests/vm/charge_reserved_hugetlb.sh @@ -0,0 +1,527 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 + +set -e + +if [[ $(id -u) -ne 0 ]]; then + echo "This test must be run as root. Skipping..." + exit 0 +fi + +cgroup_path=/dev/cgroup/memory +if [[ ! -e $cgroup_path ]]; then + mkdir -p $cgroup_path + mount -t cgroup2 none $cgroup_path +fi + +echo "+hugetlb" > /dev/cgroup/memory/cgroup.subtree_control + + +cleanup () { + echo $$ > $cgroup_path/cgroup.procs + + if [[ -e /mnt/huge ]]; then + rm -rf /mnt/huge/* + umount /mnt/huge || echo error + rmdir /mnt/huge + fi + if [[ -e $cgroup_path/hugetlb_cgroup_test ]]; then + rmdir $cgroup_path/hugetlb_cgroup_test + fi + if [[ -e $cgroup_path/hugetlb_cgroup_test1 ]]; then + rmdir $cgroup_path/hugetlb_cgroup_test1 + fi + if [[ -e $cgroup_path/hugetlb_cgroup_test2 ]]; then + rmdir $cgroup_path/hugetlb_cgroup_test2 + fi + echo 0 > /proc/sys/vm/nr_hugepages + echo CLEANUP DONE +} + +function expect_equal() { + local expected="$1" + local actual="$2" + local error="$3" + + if [[ "$expected" != "$actual" ]]; then + echo "expected ($expected) != actual ($actual): $3" + cleanup + exit 1 + fi +} + +function setup_cgroup() { + local name="$1" + local cgroup_limit="$2" + local reservation_limit="$3" + + mkdir $cgroup_path/$name + + echo writing cgroup limit: "$cgroup_limit" + echo "$cgroup_limit" > $cgroup_path/$name/hugetlb.2MB.limit_in_bytes + + echo writing reseravation limit: "$reservation_limit" + echo "$reservation_limit" > \ + $cgroup_path/$name/hugetlb.2MB.reservation_limit_in_bytes + + if [ -e "$cgroup_path/$name/cpuset.cpus" ]; then +echo 0 > $cgroup_path/$name/cpuset.cpus + fi + if [ -e "$cgroup_path/$name/cpuset.mems" ]; then +echo 0 > $cgroup_path/$name/cpuset.mems + fi +} + +function wait_for_hugetlb_memory_to_get_depleted { + local cgroup="$1" + local path="/dev/cgroup/memory/$cgroup/hugetlb.2MB.reservation_usage_in_bytes" + # Wait for hugetlbfs memory to get depleted. + while [ $(cat $path) != 0 ]; do + echo Waiting for hugetlb memory to get depleted. + cat $path + sleep 0.5 + done +} + +function wait_for_hugetlb_memory_to_get_reserved { + local cgroup="$1" + local size="$2" + + local path="/dev/cgroup/memory/$cgroup/hugetlb.2MB.reservation_usage_in_bytes" + # Wait for hugetlbfs memory to get written. + while [ $(cat $path) != $size ]; do + echo Waiting for hugetlb memory to reach size $size. + cat $path + sleep 0.5 + done +} + +function wait_for_hugetlb_memory_to_get_written { + local cgroup="$1" + local size="$2" + + local path="/dev/cgroup/memory/$cgroup/hugetlb.2MB.usage_in_bytes" + # Wait for hugetlbfs memory to get written. + while [ $(cat $path) != $size ]; do + echo Waiting for hugetlb
[PATCH v6 6/9] hugetlb_cgroup: add accounting for shared mappings
For shared mappings, the pointer to the hugetlb_cgroup to uncharge lives in the resv_map entries, in file_region->reservation_counter. After a call to region_chg, we charge the approprate hugetlb_cgroup, and if successful, we pass on the hugetlb_cgroup info to a follow up region_add call. When a file_region entry is added to the resv_map via region_add, we put the pointer to that cgroup in file_region->reservation_counter. If charging doesn't succeed, we report the error to the caller, so that the kernel fails the reservation. On region_del, which is when the hugetlb memory is unreserved, we also uncharge the file_region->reservation_counter. Signed-off-by: Mina Almasry --- mm/hugetlb.c | 147 --- 1 file changed, 116 insertions(+), 31 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f9c1947925bb9..af336bf227fb6 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -242,6 +242,15 @@ struct file_region { struct list_head link; long from; long to; +#ifdef CONFIG_CGROUP_HUGETLB + /* +* On shared mappings, each reserved region appears as a struct +* file_region in resv_map. These fields hold the info needed to +* uncharge each reservation. +*/ + struct page_counter *reservation_counter; + unsigned long pages_per_hpage; +#endif }; /* Helper that removes a struct file_region from the resv_map cache and returns @@ -250,12 +259,30 @@ struct file_region { static struct file_region * get_file_region_entry_from_cache(struct resv_map *resv, long from, long to); +/* Helper that records hugetlb_cgroup uncharge info. */ +static void record_hugetlb_cgroup_uncharge_info(struct hugetlb_cgroup *h_cg, + struct file_region *nrg, + struct hstate *h) +{ +#ifdef CONFIG_CGROUP_HUGETLB + if (h_cg) { + nrg->reservation_counter = + _cg->reserved_hugepage[hstate_index(h)]; + nrg->pages_per_hpage = pages_per_huge_page(h); + } else { + nrg->reservation_counter = NULL; + nrg->pages_per_hpage = 0; + } +#endif +} + /* Must be called with resv->lock held. Calling this with count_only == true * will count the number of pages to be added but will not modify the linked * list. */ static long add_reservation_in_range(struct resv_map *resv, long f, long t, -bool count_only) +struct hugetlb_cgroup *h_cg, +struct hstate *h, bool count_only) { long add = 0; struct list_head *head = >regions; @@ -291,6 +318,8 @@ static long add_reservation_in_range(struct resv_map *resv, long f, long t, if (!count_only) { nrg = get_file_region_entry_from_cache( resv, last_accounted_offset, rg->from); + record_hugetlb_cgroup_uncharge_info(h_cg, nrg, + h); list_add(>link, rg->link.prev); } } @@ -306,11 +335,13 @@ static long add_reservation_in_range(struct resv_map *resv, long f, long t, if (!count_only) { nrg = get_file_region_entry_from_cache( resv, last_accounted_offset, t); + record_hugetlb_cgroup_uncharge_info(h_cg, nrg, h); list_add(>link, rg->link.prev); } last_accounted_offset = t; } + VM_BUG_ON(add < 0); return add; } @@ -327,7 +358,8 @@ static long add_reservation_in_range(struct resv_map *resv, long f, long t, * Return the number of new huge pages added to the map. This * number is greater than or equal to zero. */ -static long region_add(struct resv_map *resv, long f, long t, +static long region_add(struct hstate *h, struct hugetlb_cgroup *h_cg, + struct resv_map *resv, long f, long t, long regions_needed) { long add = 0; @@ -336,7 +368,7 @@ static long region_add(struct resv_map *resv, long f, long t, VM_BUG_ON(resv->region_cache_count < regions_needed); - add = add_reservation_in_range(resv, f, t, false); + add = add_reservation_in_range(resv, f, t, h_cg, h, false); resv->adds_in_progress -= regions_needed; spin_unlock(>lock); @@ -398,7 +430,7 @@ static long region_chg(struct resv_map *resv, long f, long t, } /* Count how many hugepages in this range are NOT respresented. */ - chg = add_reservation_in_range(resv, f, t, true); + chg = add_reservation_in_range(resv, f, t, NULL, NULL, true); spin_unlock(>lock); return chg; @@
[PATCH v6 7/9] hugetlb_cgroup: support noreserve mappings
Support MAP_NORESERVE accounting as part of the new counter. For each hugepage allocation, at allocation time we check if there is a reservation for this allocation or not. If there is a reservation for this allocation, then this allocation was charged at reservation time, and we don't re-account it. If there is no reserevation for this allocation, we charge the appropriate hugetlb_cgroup. The hugetlb_cgroup to uncharge for this allocation is stored in page[3].private. We use new APIs added in an earlier patch to set this pointer. --- mm/hugetlb.c | 25 - 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index af336bf227fb6..79b99878ce6f9 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1217,6 +1217,7 @@ static void update_and_free_page(struct hstate *h, struct page *page) 1 << PG_writeback); } VM_BUG_ON_PAGE(hugetlb_cgroup_from_page(page, false), page); + VM_BUG_ON_PAGE(hugetlb_cgroup_from_page(page, true), page); set_compound_page_dtor(page, NULL_COMPOUND_DTOR); set_page_refcounted(page); if (hstate_is_gigantic(h)) { @@ -1328,6 +1329,9 @@ void free_huge_page(struct page *page) clear_page_huge_active(page); hugetlb_cgroup_uncharge_page(hstate_index(h), pages_per_huge_page(h), page, false); + hugetlb_cgroup_uncharge_page(hstate_index(h), pages_per_huge_page(h), +page, true); + if (restore_reserve) h->resv_huge_pages++; @@ -1354,6 +1358,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) set_compound_page_dtor(page, HUGETLB_PAGE_DTOR); spin_lock(_lock); set_hugetlb_cgroup(page, NULL, false); + set_hugetlb_cgroup(page, NULL, true); h->nr_huge_pages++; h->nr_huge_pages_node[nid]++; spin_unlock(_lock); @@ -2155,10 +2160,19 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, gbl_chg = 1; } + /* If this allocation is not consuming a reservation, charge it now. +*/ + if (map_chg || avoid_reserve || !vma_resv_map(vma)) { + ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), + _cg, true); + if (ret) + goto out_subpool_put; + } + ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), _cg, false); if (ret) - goto out_subpool_put; + goto out_uncharge_cgroup_reservation; spin_lock(_lock); /* @@ -2182,6 +2196,11 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, } hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, page, false); + if (!vma_resv_map(vma) || map_chg || avoid_reserve) { + hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, +page, true); + } + spin_unlock(_lock); set_page_private(page, (unsigned long)spool); @@ -2207,6 +2226,10 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, out_uncharge_cgroup: hugetlb_cgroup_uncharge_cgroup(idx, pages_per_huge_page(h), h_cg, false); +out_uncharge_cgroup_reservation: + if (map_chg || avoid_reserve || !vma_resv_map(vma)) + hugetlb_cgroup_uncharge_cgroup(idx, pages_per_huge_page(h), + h_cg, true); out_subpool_put: if (map_chg || avoid_reserve) hugepage_subpool_put_pages(spool, 1); -- 2.23.0.700.g56cf767bdb-goog
[PATCH v6 4/9] hugetlb_cgroup: add reservation accounting for private mappings
Normally the pointer to the cgroup to uncharge hangs off the struct page, and gets queried when it's time to free the page. With hugetlb_cgroup reservations, this is not possible. Because it's possible for a page to be reserved by one task and actually faulted in by another task. The best place to put the hugetlb_cgroup pointer to uncharge for reservations is in the resv_map. But, because the resv_map has different semantics for private and shared mappings, the code patch to charge/uncharge shared and private mappings is different. This patch implements charging and uncharging for private mappings. For private mappings, the counter to uncharge is in resv_map->reservation_counter. On initializing the resv_map this is set to NULL. On reservation of a region in private mapping, the tasks hugetlb_cgroup is charged and the hugetlb_cgroup is placed is resv_map->reservation_counter. On hugetlb_vm_op_close, we uncharge resv_map->reservation_counter. Signed-off-by: Mina Almasry Acked-by: Hillf Danton --- include/linux/hugetlb.h| 8 +++ include/linux/hugetlb_cgroup.h | 11 + mm/hugetlb.c | 44 +- mm/hugetlb_cgroup.c| 12 -- 4 files changed, 62 insertions(+), 13 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 9c49a0ba894d3..36dcda7be4b0e 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -46,6 +46,14 @@ struct resv_map { long adds_in_progress; struct list_head region_cache; long region_cache_count; +#ifdef CONFIG_CGROUP_HUGETLB + /* +* On private mappings, the counter to uncharge reservations is stored +* here. If these fields are 0, then the mapping is shared. +*/ + struct page_counter *reservation_counter; + unsigned long pages_per_hpage; +#endif }; extern struct resv_map *resv_map_alloc(void); void resv_map_release(struct kref *ref); diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h index 1bb58a63af586..f6e3d74a02536 100644 --- a/include/linux/hugetlb_cgroup.h +++ b/include/linux/hugetlb_cgroup.h @@ -25,6 +25,17 @@ struct hugetlb_cgroup; #define HUGETLB_CGROUP_MIN_ORDER 3 #ifdef CONFIG_CGROUP_HUGETLB +struct hugetlb_cgroup { + struct cgroup_subsys_state css; + /* +* the counter to account for hugepages from hugetlb. +*/ + struct page_counter hugepage[HUGE_MAX_HSTATE]; + /* +* the counter to account for hugepage reservations from hugetlb. +*/ + struct page_counter reserved_hugepage[HUGE_MAX_HSTATE]; +}; static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page, bool reserved) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 324859170463b..4a60d7d44b4c3 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -665,6 +665,16 @@ struct resv_map *resv_map_alloc(void) INIT_LIST_HEAD(_map->regions); resv_map->adds_in_progress = 0; +#ifdef CONFIG_CGROUP_HUGETLB + /* +* Initialize these to 0. On shared mappings, 0's here indicate these +* fields don't do cgroup accounting. On private mappings, these will be +* re-initialized to the proper values, to indicate that hugetlb cgroup +* reservations are to be un-charged from here. +*/ + resv_map->reservation_counter = NULL; + resv_map->pages_per_hpage = 0; +#endif INIT_LIST_HEAD(_map->region_cache); list_add(>link, _map->region_cache); @@ -3217,7 +3227,18 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma) reserve = (end - start) - region_count(resv, start, end); - kref_put(>refs, resv_map_release); +#ifdef CONFIG_CGROUP_HUGETLB + /* +* Since we check for HPAGE_RESV_OWNER above, this must a private +* mapping, and these values should be none-zero, and should point to +* the hugetlb_cgroup counter to uncharge for this reservation. +*/ + WARN_ON(!resv->reservation_counter); + WARN_ON(!resv->pages_per_hpage); + + hugetlb_cgroup_uncharge_counter(resv->reservation_counter, + (end - start) * resv->pages_per_hpage); +#endif if (reserve) { /* @@ -3227,6 +3248,8 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma) gbl_reserve = hugepage_subpool_put_pages(spool, reserve); hugetlb_acct_memory(h, -gbl_reserve); } + + kref_put(>refs, resv_map_release); } static int hugetlb_vm_op_split(struct vm_area_struct *vma, unsigned long addr) @@ -4560,6 +4583,7 @@ int hugetlb_reserve_pages(struct inode *inode, struct hstate *h = hstate_inode(inode); struct hugepage_subpool *spool = subpool_inode(inode); struct resv_map *resv_map; + struct hugetlb_cgroup *h_cg; long gbl_reserve;
[PATCH net-next v2] hv_sock: use HV_HYP_PAGE_SIZE for Hyper-V communication
From: Himadri Pandya Current code assumes PAGE_SIZE (the guest page size) is equal to the page size used to communicate with Hyper-V (which is always 4K). While this assumption is true on x86, it may not be true for Hyper-V on other architectures. For example, Linux on ARM64 may have PAGE_SIZE of 16K or 64K. A new symbol, HV_HYP_PAGE_SIZE, has been previously introduced to use when the Hyper-V page size is intended instead of the guest page size. Make this code work on non-x86 architectures by using the new HV_HYP_PAGE_SIZE symbol instead of PAGE_SIZE, where appropriate. Also replace the now redundant PAGE_SIZE_4K with HV_HYP_PAGE_SIZE. The change has no effect on x86, but lays the groundwork to run on ARM64 and others. Signed-off-by: Himadri Pandya Reviewed-by: Michael Kelley --- Changes in v2: * Revised commit message and subject [Jakub Kicinski] --- net/vmw_vsock/hyperv_transport.c | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c index 261521d..d2929ea 100644 --- a/net/vmw_vsock/hyperv_transport.c +++ b/net/vmw_vsock/hyperv_transport.c @@ -13,15 +13,16 @@ #include #include #include +#include /* Older (VMBUS version 'VERSION_WIN10' or before) Windows hosts have some - * stricter requirements on the hv_sock ring buffer size of six 4K pages. Newer - * hosts don't have this limitation; but, keep the defaults the same for compat. + * stricter requirements on the hv_sock ring buffer size of six 4K pages. + * hyperv-tlfs defines HV_HYP_PAGE_SIZE as 4K. Newer hosts don't have this + * limitation; but, keep the defaults the same for compat. */ -#define PAGE_SIZE_4K 4096 -#define RINGBUFFER_HVS_RCV_SIZE (PAGE_SIZE_4K * 6) -#define RINGBUFFER_HVS_SND_SIZE (PAGE_SIZE_4K * 6) -#define RINGBUFFER_HVS_MAX_SIZE (PAGE_SIZE_4K * 64) +#define RINGBUFFER_HVS_RCV_SIZE (HV_HYP_PAGE_SIZE * 6) +#define RINGBUFFER_HVS_SND_SIZE (HV_HYP_PAGE_SIZE * 6) +#define RINGBUFFER_HVS_MAX_SIZE (HV_HYP_PAGE_SIZE * 64) /* The MTU is 16KB per the host side's design */ #define HVS_MTU_SIZE (1024 * 16) @@ -54,7 +55,8 @@ struct hvs_recv_buf { * ringbuffer APIs that allow us to directly copy data from userspace buffer * to VMBus ringbuffer. */ -#define HVS_SEND_BUF_SIZE (PAGE_SIZE_4K - sizeof(struct vmpipe_proto_header)) +#define HVS_SEND_BUF_SIZE \ + (HV_HYP_PAGE_SIZE - sizeof(struct vmpipe_proto_header)) struct hvs_send_buf { /* The header before the payload data */ @@ -393,10 +395,10 @@ static void hvs_open_connection(struct vmbus_channel *chan) } else { sndbuf = max_t(int, sk->sk_sndbuf, RINGBUFFER_HVS_SND_SIZE); sndbuf = min_t(int, sndbuf, RINGBUFFER_HVS_MAX_SIZE); - sndbuf = ALIGN(sndbuf, PAGE_SIZE); + sndbuf = ALIGN(sndbuf, HV_HYP_PAGE_SIZE); rcvbuf = max_t(int, sk->sk_rcvbuf, RINGBUFFER_HVS_RCV_SIZE); rcvbuf = min_t(int, rcvbuf, RINGBUFFER_HVS_MAX_SIZE); - rcvbuf = ALIGN(rcvbuf, PAGE_SIZE); + rcvbuf = ALIGN(rcvbuf, HV_HYP_PAGE_SIZE); } ret = vmbus_open(chan, sndbuf, rcvbuf, NULL, 0, hvs_channel_cb, @@ -670,7 +672,7 @@ static ssize_t hvs_stream_enqueue(struct vsock_sock *vsk, struct msghdr *msg, ssize_t ret = 0; ssize_t bytes_written = 0; - BUILD_BUG_ON(sizeof(*send_buf) != PAGE_SIZE_4K); + BUILD_BUG_ON(sizeof(*send_buf) != HV_HYP_PAGE_SIZE); send_buf = kmalloc(sizeof(*send_buf), GFP_KERNEL); if (!send_buf) -- 1.8.3.1
[PATCH v6 2/9] hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations
Augments hugetlb_cgroup_charge_cgroup to be able to charge hugetlb usage or hugetlb reservation counter. Adds a new interface to uncharge a hugetlb_cgroup counter via hugetlb_cgroup_uncharge_counter. Integrates the counter with hugetlb_cgroup, via hugetlb_cgroup_init, hugetlb_cgroup_have_usage, and hugetlb_cgroup_css_offline. Signed-off-by: Mina Almasry --- include/linux/hugetlb_cgroup.h | 67 +- mm/hugetlb.c | 17 +++--- mm/hugetlb_cgroup.c| 100 + 3 files changed, 130 insertions(+), 54 deletions(-) diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h index 063962f6dfc6a..1bb58a63af586 100644 --- a/include/linux/hugetlb_cgroup.h +++ b/include/linux/hugetlb_cgroup.h @@ -22,27 +22,35 @@ struct hugetlb_cgroup; * Minimum page order trackable by hugetlb cgroup. * At least 3 pages are necessary for all the tracking information. */ -#define HUGETLB_CGROUP_MIN_ORDER 2 +#define HUGETLB_CGROUP_MIN_ORDER 3 #ifdef CONFIG_CGROUP_HUGETLB -static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) +static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page, + bool reserved) { VM_BUG_ON_PAGE(!PageHuge(page), page); if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER) return NULL; - return (struct hugetlb_cgroup *)page[2].private; + if (reserved) + return (struct hugetlb_cgroup *)page[3].private; + else + return (struct hugetlb_cgroup *)page[2].private; } -static inline -int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg) +static inline int set_hugetlb_cgroup(struct page *page, +struct hugetlb_cgroup *h_cg, +bool reservation) { VM_BUG_ON_PAGE(!PageHuge(page), page); if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER) return -1; - page[2].private = (unsigned long)h_cg; + if (reservation) + page[3].private = (unsigned long)h_cg; + else + page[2].private = (unsigned long)h_cg; return 0; } @@ -52,26 +60,33 @@ static inline bool hugetlb_cgroup_disabled(void) } extern int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, - struct hugetlb_cgroup **ptr); + struct hugetlb_cgroup **ptr, + bool reserved); extern void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, struct hugetlb_cgroup *h_cg, -struct page *page); +struct page *page, bool reserved); extern void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages, -struct page *page); +struct page *page, bool reserved); + extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages, - struct hugetlb_cgroup *h_cg); + struct hugetlb_cgroup *h_cg, + bool reserved); +extern void hugetlb_cgroup_uncharge_counter(struct page_counter *p, + unsigned long nr_pages); + extern void hugetlb_cgroup_file_init(void) __init; extern void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage); #else -static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page) +static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page *page, + bool reserved) { return NULL; } -static inline -int set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg) +static inline int set_hugetlb_cgroup(struct page *page, +struct hugetlb_cgroup *h_cg, bool reserved) { return 0; } @@ -81,28 +96,30 @@ static inline bool hugetlb_cgroup_disabled(void) return true; } -static inline int -hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, -struct hugetlb_cgroup **ptr) +static inline int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages, + struct hugetlb_cgroup **ptr, + bool reserved) { return 0; } -static inline void -hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages, -struct hugetlb_cgroup *h_cg, -struct page *page) +static inline void hugetlb_cgroup_commit_charge(int idx, unsigned long
[PATCH v6 1/9] hugetlb_cgroup: Add hugetlb_cgroup reservation counter
These counters will track hugetlb reservations rather than hugetlb memory faulted in. This patch only adds the counter, following patches add the charging and uncharging of the counter. Problem: Currently tasks attempting to allocate more hugetlb memory than is available get a failure at mmap/shmget time. This is thanks to Hugetlbfs Reservations [1]. However, if a task attempts to allocate hugetlb memory only more than its hugetlb_cgroup limit allows, the kernel will allow the mmap/shmget call, but will SIGBUS the task when it attempts to fault the memory in. We have developers interested in using hugetlb_cgroups, and they have expressed dissatisfaction regarding this behavior. We'd like to improve this behavior such that tasks violating the hugetlb_cgroup limits get an error on mmap/shmget time, rather than getting SIGBUS'd when they try to fault the excess memory in. The underlying problem is that today's hugetlb_cgroup accounting happens at hugetlb memory *fault* time, rather than at *reservation* time. Thus, enforcing the hugetlb_cgroup limit only happens at fault time, and the offending task gets SIGBUS'd. Proposed Solution: A new page counter named hugetlb.xMB.reservation_[limit|usage]_in_bytes. This counter has slightly different semantics than hugetlb.xMB.[limit|usage]_in_bytes: - While usage_in_bytes tracks all *faulted* hugetlb memory, reservation_usage_in_bytes tracks all *reserved* hugetlb memory and hugetlb memory faulted in without a prior reservation. - If a task attempts to reserve more memory than limit_in_bytes allows, the kernel will allow it to do so. But if a task attempts to reserve more memory than reservation_limit_in_bytes, the kernel will fail this reservation. This proposal is implemented in this patch series, with tests to verify functionality and show the usage. We also added cgroup-v2 support to hugetlb_cgroup so that the new use cases can be extended to v2. Alternatives considered: 1. A new cgroup, instead of only a new page_counter attached to the existing hugetlb_cgroup. Adding a new cgroup seemed like a lot of code duplication with hugetlb_cgroup. Keeping hugetlb related page counters under hugetlb_cgroup seemed cleaner as well. 2. Instead of adding a new counter, we considered adding a sysctl that modifies the behavior of hugetlb.xMB.[limit|usage]_in_bytes, to do accounting at reservation time rather than fault time. Adding a new page_counter seems better as userspace could, if it wants, choose to enforce different cgroups differently: one via limit_in_bytes, and another via reservation_limit_in_bytes. This could be very useful if you're transitioning how hugetlb memory is partitioned on your system one cgroup at a time, for example. Also, someone may find usage for both limit_in_bytes and reservation_limit_in_bytes concurrently, and this approach gives them the option to do so. Testing: - Added tests passing. - libhugetlbfs tests mostly passing, but some tests have trouble with and without this patch series. Seems environment issue rather than code: - Overall results: ** TEST SUMMARY * 2M * 32-bit 64-bit * Total testcases:84 0 * Skipped: 0 0 *PASS:66 0 *FAIL:14 0 *Killed by signal: 0 0 * Bad configuration: 4 0 * Expected FAIL: 0 0 * Unexpected PASS: 0 0 *Test not present: 0 0 * Strange test result: 0 0 ** - Failing tests: - elflink_rw_and_share_test("linkhuge_rw") segfaults with and without this patch series. - LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes malloc (2M: 32): FAILAddress is not hugepage - LD_PRELOAD=libhugetlbfs.so HUGETLB_RESTRICT_EXE=unknown:malloc HUGETLB_MORECORE=yes malloc (2M: 32): FAILAddress is not hugepage - LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes malloc_manysmall (2M: 32): FAILAddress is not hugepage - GLIBC_TUNABLES=glibc.malloc.tcache_count=0 LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes heapshrink (2M: 32): FAILHeap not on hugepages - GLIBC_TUNABLES=glibc.malloc.tcache_count=0 LD_PRELOAD=libhugetlbfs.so libheapshrink.so HUGETLB_MORECORE=yes heapshrink (2M: 32): FAILHeap not on hugepages - HUGETLB_ELFMAP=RW linkhuge_rw (2M: 32): FAILsmall_data is not hugepage - HUGETLB_ELFMAP=RW HUGETLB_MINIMAL_COPY=no linkhuge_rw (2M: 32): FAILsmall_data is not hugepage - alloc-instantiate-race shared (2M: 32): Bad configuration: sched_setaffinity(cpu1): Invalid argument - FAILChild 1 killed by signal Killed - shmoverride_linked (2M: 32): FAILshmget failed size 2097152 from line 176: Invalid argument - HUGETLB_SHM=yes shmoverride_linked (2M: 32): FAILshmget failed
[PATCH v6 3/9] hugetlb_cgroup: add cgroup-v2 support
--- mm/hugetlb_cgroup.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c index 854117513979b..ac1500205faf7 100644 --- a/mm/hugetlb_cgroup.c +++ b/mm/hugetlb_cgroup.c @@ -503,8 +503,13 @@ static void __init __hugetlb_cgroup_file_init(int idx) cft = >cgroup_files[HUGETLB_RES_NULL]; memset(cft, 0, sizeof(*cft)); - WARN_ON(cgroup_add_legacy_cftypes(_cgrp_subsys, - h->cgroup_files)); + if (cgroup_subsys_on_dfl(hugetlb_cgrp_subsys)) { + WARN_ON(cgroup_add_dfl_cftypes(_cgrp_subsys, + h->cgroup_files)); + } else { + WARN_ON(cgroup_add_legacy_cftypes(_cgrp_subsys, + h->cgroup_files)); + } } void __init hugetlb_cgroup_file_init(void) @@ -548,8 +553,14 @@ void hugetlb_cgroup_migrate(struct page *oldhpage, struct page *newhpage) return; } +static struct cftype hugetlb_files[] = { + {} /* terminate */ +}; + struct cgroup_subsys hugetlb_cgrp_subsys = { .css_alloc = hugetlb_cgroup_css_alloc, .css_offline= hugetlb_cgroup_css_offline, .css_free = hugetlb_cgroup_css_free, + .dfl_cftypes = hugetlb_files, + .legacy_cftypes = hugetlb_files, }; -- 2.23.0.700.g56cf767bdb-goog
[PATCH v6 5/9] hugetlb: disable region_add file_region coalescing
A follow up patch in this series adds hugetlb cgroup uncharge info the file_region entries in resv->regions. The cgroup uncharge info may differ for different regions, so they can no longer be coalesced at region_add time. So, disable region coalescing in region_add in this patch. Behavior change: Say a resv_map exists like this [0->1], [2->3], and [5->6]. Then a region_chg/add call comes in region_chg/add(f=0, t=5). Old code would generate resv->regions: [0->5], [5->6]. New code would generate resv->regions: [0->1], [1->2], [2->3], [3->5], [5->6]. Special care needs to be taken to handle the resv->adds_in_progress variable correctly. In the past, only 1 region would be added for every region_chg and region_add call. But now, each call may add multiple regions, so we can no longer increment adds_in_progress by 1 in region_chg, or decrement adds_in_progress by 1 after region_add or region_abort. Instead, region_chg calls add_reservation_in_range() to count the number of regions needed and allocates those, and that info is passed to region_add and region_abort to decrement adds_in_progress correctly. Signed-off-by: Mina Almasry --- Changes in v6: - Fix bug in number of region_caches allocated by region_chg --- mm/hugetlb.c | 256 +-- 1 file changed, 147 insertions(+), 109 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4a60d7d44b4c3..f9c1947925bb9 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -244,6 +244,12 @@ struct file_region { long to; }; +/* Helper that removes a struct file_region from the resv_map cache and returns + * it for use. + */ +static struct file_region * +get_file_region_entry_from_cache(struct resv_map *resv, long from, long to); + /* Must be called with resv->lock held. Calling this with count_only == true * will count the number of pages to be added but will not modify the linked * list. @@ -251,51 +257,61 @@ struct file_region { static long add_reservation_in_range(struct resv_map *resv, long f, long t, bool count_only) { - long chg = 0; + long add = 0; struct list_head *head = >regions; + long last_accounted_offset = f; struct file_region *rg = NULL, *trg = NULL, *nrg = NULL; - /* Locate the region we are before or in. */ - list_for_each_entry (rg, head, link) - if (f <= rg->to) - break; - - /* Round our left edge to the current segment if it encloses us. */ - if (f > rg->from) - f = rg->from; - - chg = t - f; + /* In this loop, we essentially handle an entry for the range +* last_accounted_offset -> rg->from, at every iteration, with some +* bounds checking. +*/ + list_for_each_entry_safe(rg, trg, head, link) { + /* Skip irrelevant regions that start before our range. */ + if (rg->from < f) { + /* If this region ends after the last accounted offset, +* then we need to update last_accounted_offset. +*/ + if (rg->to > last_accounted_offset) + last_accounted_offset = rg->to; + continue; + } - /* Check for and consume any regions we now overlap with. */ - nrg = rg; - list_for_each_entry_safe (rg, trg, rg->link.prev, link) { - if (>link == head) - break; + /* When we find a region that starts beyond our range, we've +* finished. +*/ if (rg->from > t) break; - /* We overlap with this area, if it extends further than -* us then we must extend ourselves. Account for its -* existing reservation. + /* Add an entry for last_accounted_offset -> rg->from, and +* update last_accounted_offset. */ - if (rg->to > t) { - chg += rg->to - t; - t = rg->to; + if (rg->from > last_accounted_offset) { + add += rg->from - last_accounted_offset; + if (!count_only) { + nrg = get_file_region_entry_from_cache( + resv, last_accounted_offset, rg->from); + list_add(>link, rg->link.prev); + } } - chg -= rg->to - rg->from; - if (!count_only && rg != nrg) { - list_del(>link); - kfree(rg); - } + last_accounted_offset = rg->to; } - if (!count_only) { - nrg->from = f; - nrg->to = t; + /* Handle the case where our range extends beyond +
Re: [PATCH 4/7] Teach SELinux about a new userfaultfd class
On Sat, Oct 12, 2019 at 4:09 PM Andy Lutomirski wrote: > > On Sat, Oct 12, 2019 at 12:16 PM Daniel Colascione wrote: > > > > Use the secure anonymous inode LSM hook we just added to let SELinux > > policy place restrictions on userfaultfd use. The create operation > > applies to processes creating new instances of these file objects; > > transfer between processes is covered by restrictions on read, write, > > and ioctl access already checked inside selinux_file_receive. > > This is great, and I suspect we'll want it for things like SGX, too. > But the current design seems like it will make it essentially > impossible for SELinux to reference an anon_inode class whose > file_operations are in a module, and moving file_operations out of a > module would be nasty. > > Could this instead be keyed off a new struct anon_inode_class, an > enum, or even just a string? The new LSM hook already receives the string that callers pass to the anon_inode APIs; modules can look at that instead of the fops if they want. The reason to pass both the name and the fops through the hook is to allow LSMs to match using fops comparison (which seems less prone to breakage) when possible and rely on string matching when it isn't.
Re: [PATCH bpf v2] libbpf: fix passing uninitialized bytes to setsockopt
On Wed, Oct 09, 2019 at 06:49:29PM +0200, Ilya Maximets wrote: > 'struct xdp_umem_reg' has 4 bytes of padding at the end that makes > valgrind complain about passing uninitialized stack memory to the > syscall: > > Syscall param socketcall.setsockopt() points to uninitialised byte(s) > at 0x4E7AB7E: setsockopt (in /usr/lib64/libc-2.29.so) > by 0x4BDE035: xsk_umem__create@@LIBBPF_0.0.4 (xsk.c:172) > Uninitialised value was created by a stack allocation > at 0x4BDDEBA: xsk_umem__create@@LIBBPF_0.0.4 (xsk.c:140) > > Padding bytes appeared after introducing of a new 'flags' field. > memset() is required to clear them. > > Fixes: 10d30e301732 ("libbpf: add flags to umem config") > Signed-off-by: Ilya Maximets > --- > > Version 2: > * Struct initializer replaced with explicit memset(). [Andrii] > > tools/lib/bpf/xsk.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c > index a902838f9fcc..9d5348086203 100644 > --- a/tools/lib/bpf/xsk.c > +++ b/tools/lib/bpf/xsk.c > @@ -163,6 +163,7 @@ int xsk_umem__create_v0_0_4(struct xsk_umem **umem_ptr, > void *umem_area, > umem->umem_area = umem_area; > xsk_set_umem_config(>config, usr_config); > > + memset(, 0, sizeof(mr)); > mr.addr = (uintptr_t)umem_area; > mr.len = size; > mr.chunk_size = umem->config.frame_size; This was already applied. Why did you resend?
Re: [PATCH 3/3] rtc: ds1685: add indirect access method and remove plat_read/plat_write
On 10/11/2019 11:05, Thomas Bogendoerfer wrote: > Use of provided plat_read/plat_write introduces the problem of possible > different lifetime of rtc driver and plat_XXX function provider. As > this was only intended for SGI Octane (IP30) this patchset implements > a register indirect access method for IP30 and introduces an > access_type field in platform data to select how registers are > accessed. And since there are no resource allocating stunts needed > anymore it also gets rid of alloc_io_resources from platform data. > Actually, I did it this way because IP32 was already in-tree, and IP30 was not. So the default ds1685_{read,write} functions were geared for the in-tree machine, and IP30 brought along its own versions. If IP30 support gets merged into the kernel, this isn't needed anymore, but I don't think this explanation accurately captures that. The chief difference between IP32 and IP30's manner of accessing the RTC is that IP32 has a 256-byte gap between each RTC register for unknown reasons (this is documented in the IP32 hardware data sheets I have), and access has to be MMIO'ed, since the RTC is hanging off of the MACE PCI structs, like every other device in IP32's code. IP30 doesn't have this register gap to worry about, and it accesses the RTC registers via PIO. > Signed-off-by: Thomas Bogendoerfer > --- > arch/mips/sgi-ip32/ip32-platform.c | 2 +- > drivers/rtc/rtc-ds1685.c | 67 > -- > include/linux/rtc/ds1685.h | 8 +++-- > 3 files changed, 48 insertions(+), 29 deletions(-) > > diff --git a/arch/mips/sgi-ip32/ip32-platform.c > b/arch/mips/sgi-ip32/ip32-platform.c > index 5a2a82148d8d..c3909bd8dd1a 100644 > --- a/arch/mips/sgi-ip32/ip32-platform.c > +++ b/arch/mips/sgi-ip32/ip32-platform.c > @@ -115,7 +115,7 @@ ip32_rtc_platform_data[] = { > .bcd_mode = true, > .no_irq = false, > .uie_unsupported = false, > - .alloc_io_resources = true, > + .access_type = ds1685_reg_direct, > .plat_prepare_poweroff = ip32_prepare_poweroff, > }, > }; > diff --git a/drivers/rtc/rtc-ds1685.c b/drivers/rtc/rtc-ds1685.c > index 349a8d1caca1..9c5d064ebb6c 100644 > --- a/drivers/rtc/rtc-ds1685.c > +++ b/drivers/rtc/rtc-ds1685.c > @@ -59,6 +59,32 @@ ds1685_write(struct ds1685_priv *rtc, int reg, u8 value) > } > /* --- */ > > +/* Indirect read/write functions */ > + > +/** > + * ds1685_indir_read - read a value from an rtc register. > + * @rtc: pointer to the ds1685 rtc structure. > + * @reg: the register address to read. > + */ > +static u8 > +ds1685_indir_read(struct ds1685_priv *rtc, int reg) > +{ > + writeb(reg, rtc->regs); > + return readb(rtc->data); > +} > + > +/** > + * ds1685_indir_write - write a value to an rtc register. > + * @rtc: pointer to the ds1685 rtc structure. > + * @reg: the register address to write. > + * @value: value to write to the register. > + */ > +static void > +ds1685_indir_write(struct ds1685_priv *rtc, int reg, u8 value) > +{ > + writeb(reg, rtc->regs); > + writeb(value, rtc->data); > +} IP30 applied a mask of 0x7f on the 'reg' parameter on both of its read/write functions, which was from Stan's original code. Is this mask not needed any more with the other changes you made to the IP30 code? I remember trying to do without this mask once long ago, and something broke, so I have left it in ever since. > > /* --- */ > /* Inlined functions */ > @@ -1062,16 +1088,25 @@ ds1685_rtc_probe(struct platform_device *pdev) > if (!rtc) > return -ENOMEM; > > - /* > - * Allocate/setup any IORESOURCE_MEM resources, if required. Not all > - * platforms put the RTC in an easy-access place. Like the SGI Octane, > - * which attaches the RTC to a "ByteBus", hooked to a SuperIO chip > - * that sits behind the IOC3 PCI metadevice. > - */ > - if (pdata->alloc_io_resources) { > + /* Setup resources and access functions */ > + switch (pdata->access_type) { > + case ds1685_reg_direct: > + rtc->regs = devm_platform_ioremap_resource(pdev, 0); > + if (IS_ERR(rtc->regs)) > + return PTR_ERR(rtc->regs); > + rtc->read = ds1685_read; > + rtc->write = ds1685_write; > + break; > + case ds1685_reg_indirect: > rtc->regs = devm_platform_ioremap_resource(pdev, 0); > if (IS_ERR(rtc->regs)) > return PTR_ERR(rtc->regs); > + rtc->data = devm_platform_ioremap_resource(pdev, 1); > + if (IS_ERR(rtc->data)) > + return PTR_ERR(rtc->data); > + rtc->read = ds1685_indir_read; > + rtc->write = ds1685_indir_write; > + break; > } I
Re: [PATCH v5 bpf-next 00/15] samples: bpf: improve/fix cross-compilation
On Fri, Oct 11, 2019 at 5:07 AM Ilias Apalodimas wrote: > > On Fri, Oct 11, 2019 at 03:27:53AM +0300, Ivan Khoronzhuk wrote: > > This series contains mainly fixes/improvements for cross-compilation > > but not only, tested for arm, arm64, and intended for any arch. > > Also verified on native build (not cross compilation) for x86_64 > > and arm, arm64. ... > For native compilation on x86_64 and aarch64 > > Tested-by: Ilias Apalodimas Applied. Thanks
Re: [PATCH 6/7] Allow users to require UFFD_SECURE
On Sat, Oct 12, 2019 at 12:16 PM Daniel Colascione wrote: > > This change adds 2 as an allowable value for > unprivileged_userfaultfd. (Previously, this sysctl could be either 0 > or 1.) When unprivileged_userfaultfd is 2, users with CAP_SYS_PTRACE > may create userfaultfd with or without UFFD_SECURE, but users without > CAP_SYS_PTRACE must pass UFFD_SECURE to userfaultfd in order for the > system call to succeed, effectively forcing them to opt into > additional security checks. This patch can go away entirely if you make UFFD_SECURE automatic.
Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
On Sat, Oct 12, 2019 at 12:16 PM Daniel Colascione wrote: > > The new secure flag makes userfaultfd use a new "secure" anonymous > file object instead of the default one, letting security modules > supervise userfaultfd use. > > Requiring that users pass a new flag lets us avoid changing the > semantics for existing callers. Is there any good reason not to make this be the default? The only downside I can see is that it would increase the memory usage of userfaultfd(), but that doesn't seem like such a big deal. A lighter-weight alternative would be to have a single inode shared by all userfaultfd instances, which would require a somewhat different internal anon_inode API. In any event, I don't think that "make me visible to SELinux" should be a choice that user code makes. --Andy
Re: [PATCH 4/7] Teach SELinux about a new userfaultfd class
On Sat, Oct 12, 2019 at 12:16 PM Daniel Colascione wrote: > > Use the secure anonymous inode LSM hook we just added to let SELinux > policy place restrictions on userfaultfd use. The create operation > applies to processes creating new instances of these file objects; > transfer between processes is covered by restrictions on read, write, > and ioctl access already checked inside selinux_file_receive. This is great, and I suspect we'll want it for things like SGX, too. But the current design seems like it will make it essentially impossible for SELinux to reference an anon_inode class whose file_operations are in a module, and moving file_operations out of a module would be nasty. Could this instead be keyed off a new struct anon_inode_class, an enum, or even just a string? --Andy
Re: [PATCH 1/7 v2] tracefs: Revert ccbd54ff54e8 ("tracefs: Restrict tracefs when the kernel is locked down")
On Fri, Oct 11, 2019 at 5:59 PM Steven Rostedt wrote: > > > I bisected this down to the addition of the proxy_ops into tracefs for > lockdown. It appears that the allocation of the proxy_ops and then freeing > it in the destroy_inode callback, is causing havoc with the memory system. > Reading the documentation about destroy_inode and talking with Linus about > this, this is buggy and wrong. Can you still add the explanation about the inode memory leak to this message? Right now it just says "it's buggy and wrong". True. But doesn't explain _why_ it is buggy and wrong. Linus
Re: [GIT PULL] Staging/IIO driver fixes for 5.4-rc3
The pull request you sent on Sat, 12 Oct 2019 18:16:38 +0200: > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git > tags/staging-5.4-rc3 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/9cbc63485fd5e25cef5d64c28ca3318364073773 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] Char/Misc driver fixes for 5.4-rc3
The pull request you sent on Sat, 12 Oct 2019 18:16:59 +0200: > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git > tags/char-misc-5.4-rc3 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/da94001239cceb93c132a31928d6ddc4214862d5 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] USB fixes for 5.4-rc3
The pull request you sent on Sat, 12 Oct 2019 18:15:53 +0200: > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git tags/usb-5.4-rc3 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/6c90bbd0a4e133665128a941ffcb4f7ac5dcb3cf Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] TTY/Serial fixes for 5.4-rc3
The pull request you sent on Sat, 12 Oct 2019 18:16:14 +0200: > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git tags/tty-5.4-rc3 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/82c87e7d4068d0fc368c3e7356a94e7b87c29544 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [PATCH 2/3] rtc: ds1685: use devm_platform_ioremap_resource helper
On 10/11/2019 11:05, Thomas Bogendoerfer wrote: > Simplify ioremapping of registers by using devm_platform_ioremap_resource. > > Signed-off-by: Thomas Bogendoerfer > --- > drivers/rtc/rtc-ds1685.c | 23 +++ > include/linux/rtc/ds1685.h | 1 - > 2 files changed, 3 insertions(+), 21 deletions(-) > > diff --git a/drivers/rtc/rtc-ds1685.c b/drivers/rtc/rtc-ds1685.c > index 51f568473de8..349a8d1caca1 100644 > --- a/drivers/rtc/rtc-ds1685.c > +++ b/drivers/rtc/rtc-ds1685.c > @@ -1040,7 +1040,6 @@ static int > ds1685_rtc_probe(struct platform_device *pdev) > { > struct rtc_device *rtc_dev; > - struct resource *res; > struct ds1685_priv *rtc; > struct ds1685_rtc_platform_data *pdata; > u8 ctrla, ctrlb, hours; > @@ -1070,25 +1069,9 @@ ds1685_rtc_probe(struct platform_device *pdev) >* that sits behind the IOC3 PCI metadevice. >*/ > if (pdata->alloc_io_resources) { > - /* Get the platform resources. */ > - res = platform_get_resource(pdev, IORESOURCE_MEM, 0); > - if (!res) > - return -ENXIO; > - rtc->size = resource_size(res); > - > - /* Request a memory region. */ > - /* XXX: mmio-only for now. */ > - if (!devm_request_mem_region(>dev, res->start, rtc->size, > - pdev->name)) > - return -EBUSY; > - > - /* > - * Set the base address for the rtc, and ioremap its > - * registers. > - */ > - rtc->regs = devm_ioremap(>dev, res->start, rtc->size); > - if (!rtc->regs) > - return -ENOMEM; > + rtc->regs = devm_platform_ioremap_resource(pdev, 0); > + if (IS_ERR(rtc->regs)) > + return PTR_ERR(rtc->regs); > } > > /* Get the register step size. */ > diff --git a/include/linux/rtc/ds1685.h b/include/linux/rtc/ds1685.h > index b9671d00d964..101c7adc05a2 100644 > --- a/include/linux/rtc/ds1685.h > +++ b/include/linux/rtc/ds1685.h > @@ -43,7 +43,6 @@ struct ds1685_priv { > struct rtc_device *dev; > void __iomem *regs; > u32 regstep; > - size_t size; > int irq_num; > bool bcd_mode; > bool no_irq; > Acked-by: Joshua Kinard
Re: [PATCH 1/3] rts: ds1685: remove not needed fields from private struct
On 10/11/2019 11:05, Thomas Bogendoerfer wrote: > A few of the fields in struct ds1685_priv aren't needed at all, > so we can remove it. > > Signed-off-by: Thomas Bogendoerfer > --- > drivers/rtc/rtc-ds1685.c | 3 --- > include/linux/rtc/ds1685.h | 3 --- > 2 files changed, 6 deletions(-) > > diff --git a/drivers/rtc/rtc-ds1685.c b/drivers/rtc/rtc-ds1685.c > index 184e4a3e2bef..51f568473de8 100644 > --- a/drivers/rtc/rtc-ds1685.c > +++ b/drivers/rtc/rtc-ds1685.c > @@ -1086,12 +1086,10 @@ ds1685_rtc_probe(struct platform_device *pdev) >* Set the base address for the rtc, and ioremap its >* registers. >*/ > - rtc->baseaddr = res->start; > rtc->regs = devm_ioremap(>dev, res->start, rtc->size); > if (!rtc->regs) > return -ENOMEM; > } > - rtc->alloc_io_resources = pdata->alloc_io_resources; > > /* Get the register step size. */ > if (pdata->regstep > 0) > @@ -1271,7 +1269,6 @@ ds1685_rtc_probe(struct platform_device *pdev) > /* See if the platform doesn't support UIE. */ > if (pdata->uie_unsupported) > rtc_dev->uie_unsupported = 1; > - rtc->uie_unsupported = pdata->uie_unsupported; > > rtc->dev = rtc_dev; > > diff --git a/include/linux/rtc/ds1685.h b/include/linux/rtc/ds1685.h > index 43aec568ba7c..b9671d00d964 100644 > --- a/include/linux/rtc/ds1685.h > +++ b/include/linux/rtc/ds1685.h > @@ -43,13 +43,10 @@ struct ds1685_priv { > struct rtc_device *dev; > void __iomem *regs; > u32 regstep; > - resource_size_t baseaddr; > size_t size; > int irq_num; > bool bcd_mode; > bool no_irq; > - bool uie_unsupported; > - bool alloc_io_resources; > u8 (*read)(struct ds1685_priv *, int); > void (*write)(struct ds1685_priv *, int, u8); > void (*prepare_poweroff)(void); > Acked-by: Joshua Kinard
Re: [PATCH net 0/2] vsock: don't allow half-closed socket in the host transports
On Fri, Oct 11, 2019 at 04:34:57PM +0200, Stefano Garzarella wrote: > On Fri, Oct 11, 2019 at 10:19:13AM -0400, Michael S. Tsirkin wrote: > > On Fri, Oct 11, 2019 at 03:07:56PM +0200, Stefano Garzarella wrote: > > > We are implementing a test suite for the VSOCK sockets and we discovered > > > that vmci_transport never allowed half-closed socket on the host side. > > > > > > As Jorgen explained [1] this is due to the implementation of VMCI. > > > > > > Since we want to have the same behaviour across all transports, this > > > series adds a section in the "Implementation notes" to exaplain this > > > behaviour, and changes the vhost_transport to behave the same way. > > > > > > [1] https://patchwork.ozlabs.org/cover/847998/#1831400 > > > > Half closed sockets are very useful, and lots of > > applications use tricks to swap a vsock for a tcp socket, > > which might as a result break. > > Got it! > > > > > If VMCI really cares it can implement an ioctl to > > allow applications to detect that half closed sockets aren't supported. > > > > It does not look like VMCI wants to bother (users do not read > > kernel implementation notes) so it does not really care. > > So why do we want to cripple other transports intentionally? > > The main reason is that we are developing the test suite and we noticed > the miss match. Since we want to make sure that applications behave in > the same way on different transports, we thought we would solve it that > way. > > But what you are saying (also in the reply of the patches) is actually > quite right. Not being publicized, applications do not expect this behavior, > so please discard this series. > > My problem during the tests, was trying to figure out if half-closed > sockets were supported or not, so as you say adding an IOCTL or maybe > better a getsockopt() could solve the problem. > > What do you think? > > Thanks, > Stefano Sure, why not.
Re: [GIT PULL] perf fixes
The pull request you sent on Sat, 12 Oct 2019 15:31:34 +0200: > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git > perf-urgent-for-linus has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/465a7e291fd4f056d81baf5d5ed557bdb44c5457 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] EFI fixes
The pull request you sent on Sat, 12 Oct 2019 15:01:39 +0200: > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git efi-urgent-for-linus has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/9b4e40c8fe1e120fef93985de7ff6a97fe9e7dd3 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] x86 fixes
The pull request you sent on Sat, 12 Oct 2019 15:19:16 +0200: > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-urgent-for-linus has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/7a275fd7b9519b5cc63270a8964055aadb04de26 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] x86 license updates
The pull request you sent on Sat, 12 Oct 2019 13:52:57 +0200: > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git > core-urgent-for-linus has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/e9ec3588a9372dfb9b04afcddb199ad9e2be0044 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] scheduler fixes
The pull request you sent on Sat, 12 Oct 2019 16:58:36 +0200: > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git > sched-urgent-for-linus has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/328fefadd9cfa15cd6ab746553d9ef13303c11a6 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [PATCH] netdevsim: Fix error handling in nsim_fib_init and nsim_fib_exit
On Fri, 11 Oct 2019 17:46:53 +0800, YueHaibing wrote: > In nsim_fib_init(), if register_fib_notifier failed, nsim_fib_net_ops > should be unregistered before return. > > In nsim_fib_exit(), unregister_fib_notifier should be called before > nsim_fib_net_ops be unregistered, otherwise may cause use-after-free: > > BUG: KASAN: use-after-free in nsim_fib_event_nb+0x342/0x570 [netdevsim] > Read of size 8 at addr 8881daaf4388 by task kworker/0:3/3499 > > Reported-by: Hulk Robot > Fixes: 59c84b9fcf42 ("netdevsim: Restore per-network namespace accounting for > fib entries") > Signed-off-by: YueHaibing Acked-by: Jakub Kicinski
Re: [Outreachy kernel] [PATCH v2 3/5] staging: octeon: remove typedef declaration for cvmx_fau_reg_32
On Sat, Oct 12, 2019 at 08:37:18PM +0200, Julia Lawall wrote: > > > On Sat, 12 Oct 2019, Wambui Karuga wrote: > > > Remove typedef declaration for enum cvmx_fau_reg_32. > > Also replace its previous uses with new declaration format. > > Issue found by checkpatch.pl > > > > Signed-off-by: Wambui Karuga > > --- > > drivers/staging/octeon/octeon-stubs.h | 14 -- > > 1 file changed, 8 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/staging/octeon/octeon-stubs.h > > b/drivers/staging/octeon/octeon-stubs.h > > index 0991be329139..40f0cfee0dff 100644 > > --- a/drivers/staging/octeon/octeon-stubs.h > > +++ b/drivers/staging/octeon/octeon-stubs.h > > @@ -201,9 +201,9 @@ union cvmx_helper_link_info { > > } s; > > }; > > > > -typedef enum { > > +enum cvmx_fau_reg_32 { > > CVMX_FAU_REG_32_START = 0, > > -} cvmx_fau_reg_32_t; > > +}; > > > > typedef enum { > > CVMX_FAU_OP_SIZE_8 = 0, > > @@ -1178,16 +1178,18 @@ union cvmx_gmxx_rxx_rx_inbnd { > > } s; > > }; > > > > -static inline int32_t cvmx_fau_fetch_and_add32(cvmx_fau_reg_32_t reg, > > +static inline int32_t cvmx_fau_fetch_and_add32(enum cvmx_fau_reg_32 reg, > >int32_t value) > > These int32_t's don't look very desirable either. If there is only one > possible definition, you can just replace it by what it is defined to be. > > julia > Ok, I'll look into refactoring this. wambui karuga > > { > > return value; > > } > > > > -static inline void cvmx_fau_atomic_add32(cvmx_fau_reg_32_t reg, int32_t > > value) > > +static inline void cvmx_fau_atomic_add32(enum cvmx_fau_reg_32 reg, > > +int32_t value) > > { } > > > > -static inline void cvmx_fau_atomic_write32(cvmx_fau_reg_32_t reg, int32_t > > value) > > +static inline void cvmx_fau_atomic_write32(enum cvmx_fau_reg_32 reg, > > + int32_t value) > > { } > > > > static inline uint64_t cvmx_scratch_read64(uint64_t address) > > @@ -1364,7 +1366,7 @@ static inline int cvmx_spi_restart_interface(int > > interface, > > } > > > > static inline void cvmx_fau_async_fetch_and_add32(uint64_t scraddr, > > - cvmx_fau_reg_32_t reg, > > + enum cvmx_fau_reg_32 reg, > > int32_t value) > > { } > > > > -- > > 2.23.0 > > > > -- > > You received this message because you are subscribed to the Google Groups > > "outreachy-kernel" group. > > To unsubscribe from this group and stop receiving emails from it, send an > > email to outreachy-kernel+unsubscr...@googlegroups.com. > > To view this discussion on the web visit > > https://groups.google.com/d/msgid/outreachy-kernel/b7216f423d8e06b2ed7ac2df643a9215cd95be32.1570821661.git.wambui.karugax%40gmail.com. > > > > -- > You received this message because you are subscribed to the Google Groups > "outreachy-kernel" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to outreachy-kernel+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/outreachy-kernel/alpine.DEB.2.21.1910122035380.3049%40hadrien.
[PATCH 0/2] Formatting and style cleanup in rtl8712
This patch series addresses the use of unnecessary return variables and line-breaks in function headers, both in drivers/staging/rtl8712/rtl871x_mp_ioctl.c. Wambui Karuga (2): staging: rtl8712: remove unnecessary return variables staging: rtl8712: clean up function headers drivers/staging/rtl8712/rtl871x_mp_ioctl.c | 103 - 1 file changed, 38 insertions(+), 65 deletions(-) -- 2.23.0
[PATCH 1/2] staging: rtl8712: remove unnecessary return variables
Remove variables that are only used to hold and return constants and have the functions directly return the constants. Issue found by coccinelle: @@ local idexpression ret; expression e; @@ -ret = +return e; -return ret; Signed-off-by: Wambui Karuga --- drivers/staging/rtl8712/rtl871x_mp_ioctl.c | 46 +- 1 file changed, 19 insertions(+), 27 deletions(-) diff --git a/drivers/staging/rtl8712/rtl871x_mp_ioctl.c b/drivers/staging/rtl8712/rtl871x_mp_ioctl.c index aa8f8500cbb2..8af7892809ca 100644 --- a/drivers/staging/rtl8712/rtl871x_mp_ioctl.c +++ b/drivers/staging/rtl8712/rtl871x_mp_ioctl.c @@ -283,13 +283,12 @@ uint oid_rt_pro_stop_test_hdl(struct oid_par_priv *poid_par_priv) { struct _adapter *Adapter = (struct _adapter *) (poid_par_priv->adapter_context); - uint status = RNDIS_STATUS_SUCCESS; if (poid_par_priv->type_of_oid != SET_OID) return RNDIS_STATUS_NOT_ACCEPTED; if (mp_stop_test(Adapter) == _FAIL) - status = RNDIS_STATUS_NOT_ACCEPTED; - return status; + return RNDIS_STATUS_NOT_ACCEPTED; + return RNDIS_STATUS_SUCCESS; } uint oid_rt_pro_set_channel_direct_call_hdl(struct oid_par_priv @@ -350,64 +349,58 @@ uint oid_rt_pro_set_tx_power_control_hdl( uint oid_rt_pro_query_tx_packet_sent_hdl( struct oid_par_priv *poid_par_priv) { - uint status = RNDIS_STATUS_SUCCESS; struct _adapter *Adapter = (struct _adapter *) (poid_par_priv->adapter_context); - if (poid_par_priv->type_of_oid != QUERY_OID) { - status = RNDIS_STATUS_NOT_ACCEPTED; - return status; - } + if (poid_par_priv->type_of_oid != QUERY_OID) + return RNDIS_STATUS_NOT_ACCEPTED; + if (poid_par_priv->information_buf_len == sizeof(u32)) { *(u32 *)poid_par_priv->information_buf = Adapter->mppriv.tx_pktcount; *poid_par_priv->bytes_rw = poid_par_priv->information_buf_len; } else { - status = RNDIS_STATUS_INVALID_LENGTH; + return RNDIS_STATUS_INVALID_LENGTH; } - return status; + return RNDIS_STATUS_SUCCESS; } uint oid_rt_pro_query_rx_packet_received_hdl( struct oid_par_priv *poid_par_priv) { - uint status = RNDIS_STATUS_SUCCESS; struct _adapter *Adapter = (struct _adapter *) (poid_par_priv->adapter_context); - if (poid_par_priv->type_of_oid != QUERY_OID) { - status = RNDIS_STATUS_NOT_ACCEPTED; - return status; - } + if (poid_par_priv->type_of_oid != QUERY_OID) + return RNDIS_STATUS_NOT_ACCEPTED; + if (poid_par_priv->information_buf_len == sizeof(u32)) { *(u32 *)poid_par_priv->information_buf = Adapter->mppriv.rx_pktcount; *poid_par_priv->bytes_rw = poid_par_priv->information_buf_len; } else { - status = RNDIS_STATUS_INVALID_LENGTH; + return RNDIS_STATUS_INVALID_LENGTH; } - return status; + return RNDIS_STATUS_SUCCESS; } uint oid_rt_pro_query_rx_packet_crc32_error_hdl( struct oid_par_priv *poid_par_priv) { - uint status = RNDIS_STATUS_SUCCESS; struct _adapter *Adapter = (struct _adapter *) (poid_par_priv->adapter_context); - if (poid_par_priv->type_of_oid != QUERY_OID) { - status = RNDIS_STATUS_NOT_ACCEPTED; - return status; - } + if (poid_par_priv->type_of_oid != QUERY_OID) + return RNDIS_STATUS_NOT_ACCEPTED; + if (poid_par_priv->information_buf_len == sizeof(u32)) { *(u32 *)poid_par_priv->information_buf = Adapter->mppriv.rx_crcerrpktcount; *poid_par_priv->bytes_rw = poid_par_priv->information_buf_len; } else { - status = RNDIS_STATUS_INVALID_LENGTH; + return RNDIS_STATUS_INVALID_LENGTH; } - return status; + return RNDIS_STATUS_SUCCESS; } uint oid_rt_pro_reset_tx_packet_sent_hdl(struct oid_par_priv @@ -425,7 +418,6 @@ uint oid_rt_pro_reset_tx_packet_sent_hdl(struct oid_par_priv uint oid_rt_pro_reset_rx_packet_received_hdl(struct oid_par_priv *poid_par_priv) { - uint status = RNDIS_STATUS_SUCCESS; struct _adapter *Adapter = (struct _adapter *) (poid_par_priv->adapter_context); @@ -435,9 +427,9 @@ uint oid_rt_pro_reset_rx_packet_received_hdl(struct oid_par_priv Adapter->mppriv.rx_pktcount = 0;
[PATCH 2/2] staging: rtl8712: clean up function headers
Remove unnecessary line-breaks in function headers to improve readability of function headers. Signed-off-by: Wambui Karuga --- drivers/staging/rtl8712/rtl871x_mp_ioctl.c | 57 -- 1 file changed, 19 insertions(+), 38 deletions(-) diff --git a/drivers/staging/rtl8712/rtl871x_mp_ioctl.c b/drivers/staging/rtl8712/rtl871x_mp_ioctl.c index 8af7892809ca..29b85330815f 100644 --- a/drivers/staging/rtl8712/rtl871x_mp_ioctl.c +++ b/drivers/staging/rtl8712/rtl871x_mp_ioctl.c @@ -231,8 +231,7 @@ static int mp_stop_test(struct _adapter *padapter) return _SUCCESS; } -uint oid_rt_pro_set_data_rate_hdl(struct oid_par_priv -*poid_par_priv) +uint oid_rt_pro_set_data_rate_hdl(struct oid_par_priv *poid_par_priv) { struct _adapter *Adapter = (struct _adapter *) (poid_par_priv->adapter_context); @@ -291,8 +290,7 @@ uint oid_rt_pro_stop_test_hdl(struct oid_par_priv *poid_par_priv) return RNDIS_STATUS_SUCCESS; } -uint oid_rt_pro_set_channel_direct_call_hdl(struct oid_par_priv - *poid_par_priv) +uint oid_rt_pro_set_channel_direct_call_hdl(struct oid_par_priv *poid_par_priv) { struct _adapter *Adapter = (struct _adapter *) (poid_par_priv->adapter_context); @@ -327,8 +325,7 @@ uint oid_rt_pro_set_antenna_bb_hdl(struct oid_par_priv *poid_par_priv) return RNDIS_STATUS_SUCCESS; } -uint oid_rt_pro_set_tx_power_control_hdl( - struct oid_par_priv *poid_par_priv) +uint oid_rt_pro_set_tx_power_control_hdl(struct oid_par_priv *poid_par_priv) { struct _adapter *Adapter = (struct _adapter *) (poid_par_priv->adapter_context); @@ -346,8 +343,7 @@ uint oid_rt_pro_set_tx_power_control_hdl( return RNDIS_STATUS_SUCCESS; } -uint oid_rt_pro_query_tx_packet_sent_hdl( - struct oid_par_priv *poid_par_priv) +uint oid_rt_pro_query_tx_packet_sent_hdl(struct oid_par_priv *poid_par_priv) { struct _adapter *Adapter = (struct _adapter *) (poid_par_priv->adapter_context); @@ -365,8 +361,7 @@ uint oid_rt_pro_query_tx_packet_sent_hdl( return RNDIS_STATUS_SUCCESS; } -uint oid_rt_pro_query_rx_packet_received_hdl( - struct oid_par_priv *poid_par_priv) +uint oid_rt_pro_query_rx_packet_received_hdl(struct oid_par_priv *poid_par_priv) { struct _adapter *Adapter = (struct _adapter *) (poid_par_priv->adapter_context); @@ -384,8 +379,7 @@ uint oid_rt_pro_query_rx_packet_received_hdl( return RNDIS_STATUS_SUCCESS; } -uint oid_rt_pro_query_rx_packet_crc32_error_hdl( - struct oid_par_priv *poid_par_priv) +uint oid_rt_pro_query_rx_packet_crc32_error_hdl(struct oid_par_priv *poid_par_priv) { struct _adapter *Adapter = (struct _adapter *) (poid_par_priv->adapter_context); @@ -403,8 +397,7 @@ uint oid_rt_pro_query_rx_packet_crc32_error_hdl( return RNDIS_STATUS_SUCCESS; } -uint oid_rt_pro_reset_tx_packet_sent_hdl(struct oid_par_priv - *poid_par_priv) +uint oid_rt_pro_reset_tx_packet_sent_hdl(struct oid_par_priv *poid_par_priv) { struct _adapter *Adapter = (struct _adapter *) (poid_par_priv->adapter_context); @@ -415,8 +408,7 @@ uint oid_rt_pro_reset_tx_packet_sent_hdl(struct oid_par_priv return RNDIS_STATUS_SUCCESS; } -uint oid_rt_pro_reset_rx_packet_received_hdl(struct oid_par_priv - *poid_par_priv) +uint oid_rt_pro_reset_rx_packet_received_hdl(struct oid_par_priv *poid_par_priv) { struct _adapter *Adapter = (struct _adapter *) (poid_par_priv->adapter_context); @@ -432,8 +424,7 @@ uint oid_rt_pro_reset_rx_packet_received_hdl(struct oid_par_priv return RNDIS_STATUS_SUCCESS; } -uint oid_rt_reset_phy_rx_packet_count_hdl(struct oid_par_priv -*poid_par_priv) +uint oid_rt_reset_phy_rx_packet_count_hdl(struct oid_par_priv *poid_par_priv) { struct _adapter *Adapter = (struct _adapter *) (poid_par_priv->adapter_context); @@ -444,8 +435,7 @@ uint oid_rt_reset_phy_rx_packet_count_hdl(struct oid_par_priv return RNDIS_STATUS_SUCCESS; } -uint oid_rt_get_phy_rx_packet_received_hdl(struct oid_par_priv - *poid_par_priv) +uint oid_rt_get_phy_rx_packet_received_hdl(struct oid_par_priv *poid_par_priv) { struct _adapter *Adapter = (struct _adapter *) (poid_par_priv->adapter_context); @@ -460,8 +450,7 @@
[PATCH] drivers: firmware: psci: use kernel restart handler functionality
From: Stefan Agner Use the kernels restart handler to register the PSCI system reset capability. The restart handler use notifier chains along with priorities. This allows to use restart handlers with higher priority (in case available) while still supporting PSCI. Since the ARM handler had priority over the kernels restart handler before this patch, use a slightly elevated priority of 160 to make sure PSCI is used before most of the other handlers are called. Signed-off-by: Stefan Agner --- drivers/firmware/psci/psci.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c index 84f4ff351c62..d8677b54132f 100644 --- a/drivers/firmware/psci/psci.c +++ b/drivers/firmware/psci/psci.c @@ -82,6 +82,7 @@ static u32 psci_function_id[PSCI_FN_MAX]; static u32 psci_cpu_suspend_feature; static bool psci_system_reset2_supported; +static struct notifier_block psci_restart_handler; static inline bool psci_has_ext_power_state(void) { @@ -250,7 +251,8 @@ static int get_set_conduit_method(struct device_node *np) return 0; } -static void psci_sys_reset(enum reboot_mode reboot_mode, const char *cmd) +static int psci_sys_reset(struct notifier_block *this, + unsigned long reboot_mode, void *cmd) { if ((reboot_mode == REBOOT_WARM || reboot_mode == REBOOT_SOFT) && psci_system_reset2_supported) { @@ -263,6 +265,8 @@ static void psci_sys_reset(enum reboot_mode reboot_mode, const char *cmd) } else { invoke_psci_fn(PSCI_0_2_FN_SYSTEM_RESET, 0, 0, 0); } + + return NOTIFY_DONE; } static void psci_sys_poweroff(void) @@ -411,6 +415,8 @@ static void __init psci_init_smccc(void) static void __init psci_0_2_set_functions(void) { + int ret; + pr_info("Using standard PSCI v0.2 function IDs\n"); psci_ops.get_version = psci_get_version; @@ -431,7 +437,14 @@ static void __init psci_0_2_set_functions(void) psci_ops.migrate_info_type = psci_migrate_info_type; - arm_pm_restart = psci_sys_reset; + psci_restart_handler.notifier_call = psci_sys_reset; + psci_restart_handler.priority = 160; + + ret = register_restart_handler(_restart_handler); + if (ret) { + pr_err("Cannot register restart handler, %d\n", ret); + return; + } pm_power_off = psci_sys_poweroff; } -- 2.23.0
Re: [GIT PULL] MIPS fixes
The pull request you sent on Sat, 12 Oct 2019 19:04:14 +: > git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux.git > tags/mips_fixes_5.4_2 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/63f9bff56beb718ac0a2eb8398a98220b1e119dc Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] xen: fixes for 5.4-rc3
The pull request you sent on Sat, 12 Oct 2019 12:51:31 +0200: > git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git > for-linus-5.4-rc3-tag has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/680b5b3c5d34b22695357e17b6bdd0abd83e6b1c Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] s390 updates for 5.4-rc3
The pull request you sent on Sat, 12 Oct 2019 12:25:39 +0200: > git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git tags/s390-5.4-4 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/f154988a905e5cad9d1a20d4c4aeb176968fe3be Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] RISC-V updates for v5.4-rc3
The pull request you sent on Sat, 12 Oct 2019 13:10:52 -0700 (PDT): > git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git > tags/riscv/for-v5.4-rc3 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/48acba989ed5d8707500193048d6c4c5945d5f43 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.4-3 tag
The pull request you sent on Sat, 12 Oct 2019 22:37:15 +1100: > https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git > tags/powerpc-5.4-3 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/db60a5a035aa8692dc7cee293356bdcc078fa7b7 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: clk: rockchip: Checking a kmemdup() call in rockchip_clk_register_pll()
Hi Markus, Am Samstag, 12. Oktober 2019, 15:55:44 CEST schrieb Markus Elfring: > I tried another script for the semantic patch language out. > This source code analysis approach points out that the implementation > of the function “rockchip_clk_register_pll” contains also a call > of the function “kmemdup”. > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/rockchip/clk-pll.c?id=1c0cc5f1ae5ee5a6913704c0d75a6e99604ee30a#n913 > https://elixir.bootlin.com/linux/v5.4-rc2/source/drivers/clk/rockchip/clk-pll.c#L913 > > * Do you find the usage of the format string “%s: could not allocate > rate table for %s\n” still appropriate at this place? If there is an internal "no-memory" output from inside kmemdup now, I guess the one in the clock driver would be a duplicate and could go away. > * Is there a need to adjust the error handling here? There is no need for additional error handling. Like if the rate-table could not be duplicated, the clock will still report the correct clockrate you can just not set a new rate. And for a system it's always better to have the clock driver present than for all device-drivers to fail probing. Especially as this start as core clock driver, so there is no deferring possible. Heiko
[PATCH] mm: memblock: do not enforce current limit for memblock_phys* family
From: Mike Rapoport Until commit 92d12f9544b7 ("memblock: refactor internal allocation functions") the maximal address for memblock allocations was forced to memblock.current_limit only for the allocation functions returning virtual address. The changes introduced by that commit moved the limit enforcement into the allocation core and as a result the allocation functions returning physical address also started to limit allocations to memblock.current_limit. This caused breakage of etnaviv GPU driver: [3.682347] etnaviv etnaviv: bound 13.gpu (ops gpu_ops) [3.688669] etnaviv etnaviv: bound 134000.gpu (ops gpu_ops) [3.695099] etnaviv etnaviv: bound 2204000.gpu (ops gpu_ops) [3.700800] etnaviv-gpu 13.gpu: model: GC2000, revision: 5108 [3.723013] etnaviv-gpu 13.gpu: command buffer outside valid memory window [3.731308] etnaviv-gpu 134000.gpu: model: GC320, revision: 5007 [3.752437] etnaviv-gpu 134000.gpu: command buffer outside valid memory window [3.760583] etnaviv-gpu 2204000.gpu: model: GC355, revision: 1215 [3.766766] etnaviv-gpu 2204000.gpu: Ignoring GPU with VG and FE2.0 Restore the behaviour of memblock_phys* family so that these functions will not enforce memblock.current_limit. Fixes: 92d12f9544b7 ("memblock: refactor internal allocation functions") Reported-by: Adam Ford Tested-by: Adam Ford #imx6q-logicpd Signed-off-by: Mike Rapoport --- mm/memblock.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/memblock.c b/mm/memblock.c index 7d4f61a..c4b16ca 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1356,9 +1356,6 @@ static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size, align = SMP_CACHE_BYTES; } - if (end > memblock.current_limit) - end = memblock.current_limit; - again: found = memblock_find_in_range_node(size, align, start, end, nid, flags); @@ -1469,6 +1466,9 @@ static void * __init memblock_alloc_internal( if (WARN_ON_ONCE(slab_is_available())) return kzalloc_node(size, GFP_NOWAIT, nid); + if (max_addr > memblock.current_limit) + max_addr = memblock.current_limit; + alloc = memblock_alloc_range_nid(size, align, min_addr, max_addr, nid); /* retry allocation without lower limit */ -- 2.7.4
Re: [PATCH v5 bpf-next 09/15] samples/bpf: use own flags but not HOSTCFLAGS
On Fri, Oct 11, 2019 at 02:16:05PM +0300, Sergei Shtylyov wrote: On 10/11/2019 12:57 PM, Ivan Khoronzhuk wrote: While compiling natively, the host's cflags and ldflags are equal to ones used from HOSTCFLAGS and HOSTLDFLAGS. When cross compiling it should have own, used for target arch. While verification, for arm, While verifying. While verification stage. While *in* verification stage, "while" doesn't combine with nouns w/o a preposition. Sergei, better add me in cc list when msg is to me I can miss it. Regarding the language lesson, thanks, I will keep it in mind next time, but the issue is not rude, if it's an issue at all, so I better leave it as is, as not reasons to correct it w/o code changes and everyone is able to understand it. arm64 and x86_64 the following flags were used always: -Wall -O2 -fomit-frame-pointer -Wmissing-prototypes -Wstrict-prototypes So, add them as they were verified and used before adding Makefile.target and lets omit "-fomit-frame-pointer" as were proposed while review, as no sense in such optimization for samples. Signed-off-by: Ivan Khoronzhuk [...] MBR, Sergei -- Regards, Ivan Khoronzhuk
Re: [PATCH v3 2/2] iio: (bma400) add driver for the BMA400
> > No comment other than thank you for ignoring my previous comments. :( > Oh, no! Sorry, it looks like I missed you in the To and CC list in my reply :/ Cheers, - Dan signature.asc Description: Digital signature
Re: Linux 5.3.6
Am Sa., 12. Okt. 2019 um 21:16 Uhr schrieb Chris Clayton : > > > > I'm announcing the release of the 5.3.6 kernel. > > > 5.3.6 build fails here with: > > arch/x86/entry/vdso/vdso64.so.dbg: undefined symbols found > CC arch/x86/kernel/cpu/mce/threshold.o > make[3]: *** [arch/x86/entry/vdso/Makefile:59: > arch/x86/entry/vdso/vdso64.so.dbg] Error 1 > make[3]: *** Deleting file 'arch/x86/entry/vdso/vdso64.so.dbg' > make[2]: *** [scripts/Makefile.build:497: arch/x86/entry/vdso] Error 2 > make[1]: *** [scripts/Makefile.build:497: arch/x86/entry] Error 2 > make[1]: *** Waiting for unfinished jobs > What is your default linker ? Also does make LD=ld.bfd fixes that for you ? See https://bugzilla.kernel.org/show_bug.cgi?id=204951 BR, Gabriel C.
Re: [RFC PATCH net] net: phy: Fix "link partner" information disappear issue
On 11.10.2019 07:55, Yonglong Liu wrote: > > > On 2019/10/11 3:17, Heiner Kallweit wrote: >> On 10.10.2019 11:30, Yonglong Liu wrote: >>> Some drivers just call phy_ethtool_ksettings_set() to set the >>> links, for those phy drivers that use genphy_read_status(), if >>> autoneg is on, and the link is up, than execute "ethtool -s >>> ethx autoneg on" will cause "link partner" information disappear. >>> >>> The call trace is phy_ethtool_ksettings_set()->phy_start_aneg() >>> ->linkmode_zero(phydev->lp_advertising)->genphy_read_status(), >>> the link didn't change, so genphy_read_status() just return, and >>> phydev->lp_advertising is zero now. >>> >> I think that clearing link partner advertising info in >> phy_start_aneg() is questionable. If advertising doesn't change >> then phy_config_aneg() basically is a no-op. Instead we may have >> to clear the link partner advertising info in genphy_read_lpa() >> if aneg is disabled or aneg isn't completed (basically the same >> as in genphy_c45_read_lpa()). Something like: >> >> if (!phydev->autoneg_complete) { /* also covers case that aneg is disabled */ >> linkmode_zero(phydev->lp_advertising); >> } else if (phydev->autoneg == AUTONEG_ENABLE) { >> ... >> } >> > > If clear the link partner advertising info in genphy_read_lpa() and > genphy_c45_read_lpa(), for the drivers that use genphy_read_status() > is ok, but for those drivers that use there own read_status() may > have problem, like aqr_read_status(), it will update lp_advertising > first, and than call genphy_c45_read_status(), so will cause > lp_advertising lost. > Right, in genphy_read_lpa() we shouldn't clear all lpa bits but only those ones the generic functions care about. Basically the same as in the c45 version. Then a vendor-specific part isn't affected. aqr_read_status() is a good example. It deals with 1Gbps mode that isn't covered by the generic c45 functions. Therefore the 1Gbps-related bits won't be overwritten by the generic functions. > Another question, please see genphy_c45_read_status(), if clear the > link partner advertising info in genphy_c45_read_lpa(), if autoneg is > off, phydev->lp_advertising will not clear. > If autoneg is off, lp_advertising should never be set, so there's nothing to clear. However we may have to look at the case that user switches to fixed speed mode via ethtool. >>> This patch call genphy_read_lpa() before the link state judgement >>> to fix this problem. >>> >>> Fixes: 88d6272acaaa ("net: phy: avoid unneeded MDIO reads in >>> genphy_read_status") >>> Signed-off-by: Yonglong Liu >>> --- >>> drivers/net/phy/phy_device.c | 8 >>> 1 file changed, 4 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c >>> index 9d2bbb1..ef3073c 100644 >>> --- a/drivers/net/phy/phy_device.c >>> +++ b/drivers/net/phy/phy_device.c >>> @@ -1839,6 +1839,10 @@ int genphy_read_status(struct phy_device *phydev) >>> if (err) >>> return err; >>> >>> + err = genphy_read_lpa(phydev); >>> + if (err < 0) >>> + return err; >>> + >>> /* why bother the PHY if nothing can have changed */ >>> if (phydev->autoneg == AUTONEG_ENABLE && old_link && phydev->link) >>> return 0; >>> @@ -1848,10 +1852,6 @@ int genphy_read_status(struct phy_device *phydev) >>> phydev->pause = 0; >>> phydev->asym_pause = 0; >>> >>> - err = genphy_read_lpa(phydev); >>> - if (err < 0) >>> - return err; >>> - >>> if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) { >>> phy_resolve_aneg_linkmode(phydev); >>> } else if (phydev->autoneg == AUTONEG_DISABLE) { >>> >> >> >> . >> > >
Re: [PATCH RFC v1 0/2] vhost: ring format independence
On Sat, Oct 12, 2019 at 03:31:50PM +0800, Jason Wang wrote: > > On 2019/10/11 下午9:45, Michael S. Tsirkin wrote: > > So the idea is as follows: we convert descriptors to an > > independent format first, and process that converting to > > iov later. > > > > The point is that we have a tight loop that fetches > > descriptors, which is good for cache utilization. > > This will also allow all kind of batching tricks - > > e.g. it seems possible to keep SMAP disabled while > > we are fetching multiple descriptors. > > > > And perhaps more importantly, this is a very good fit for the packed > > ring layout, where we get and put descriptors in order. > > > > This patchset seems to already perform exactly the same as the original > > code already based on a microbenchmark. More testing would be very much > > appreciated. > > > > Biggest TODO before this first step is ready to go in is to > > batch indirect descriptors as well. > > > > Integrating into vhost-net is basically > > s/vhost_get_vq_desc/vhost_get_vq_desc_batch/ - > > or add a module parameter like I did in the test module. > > > It would be better to convert vhost_net then I can do some benchmark on > that. > > Thanks Sure, I post a small patch that does this. > > > > > > > > > Michael S. Tsirkin (2): > >vhost: option to fetch descriptors through an independent struct > >vhost: batching fetches > > > > drivers/vhost/test.c | 19 ++- > > drivers/vhost/vhost.c | 333 +- > > drivers/vhost/vhost.h | 20 ++- > > 3 files changed, 365 insertions(+), 7 deletions(-) > >
Re: [PATCH RFC v1 2/2] vhost: batching fetches
On Sat, Oct 12, 2019 at 03:30:52PM +0800, Jason Wang wrote: > > On 2019/10/11 下午9:46, Michael S. Tsirkin wrote: > > With this patch applied, new and old code perform identically. > > > > Lots of extra optimizations are now possible, e.g. > > we can fetch multiple heads with copy_from/to_user now. > > We can get rid of maintaining the log array. Etc etc. > > > > Signed-off-by: Michael S. Tsirkin > > --- > > drivers/vhost/test.c | 2 +- > > drivers/vhost/vhost.c | 50 --- > > drivers/vhost/vhost.h | 4 +++- > > 3 files changed, 46 insertions(+), 10 deletions(-) > > > > diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c > > index 39a018a7af2d..e3a8e9db22cd 100644 > > --- a/drivers/vhost/test.c > > +++ b/drivers/vhost/test.c > > @@ -128,7 +128,7 @@ static int vhost_test_open(struct inode *inode, struct > > file *f) > > dev = >dev; > > vqs[VHOST_TEST_VQ] = >vqs[VHOST_TEST_VQ]; > > n->vqs[VHOST_TEST_VQ].handle_kick = handle_vq_kick; > > - vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV, > > + vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV + 64, > >VHOST_TEST_PKT_WEIGHT, VHOST_TEST_WEIGHT); > > f->private_data = n; > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > index 36661d6cb51f..aa383e847865 100644 > > --- a/drivers/vhost/vhost.c > > +++ b/drivers/vhost/vhost.c > > @@ -302,6 +302,7 @@ static void vhost_vq_reset(struct vhost_dev *dev, > > { > > vq->num = 1; > > vq->ndescs = 0; > > + vq->first_desc = 0; > > vq->desc = NULL; > > vq->avail = NULL; > > vq->used = NULL; > > @@ -390,6 +391,7 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev > > *dev) > > for (i = 0; i < dev->nvqs; ++i) { > > vq = dev->vqs[i]; > > vq->max_descs = dev->iov_limit; > > + vq->batch_descs = dev->iov_limit - UIO_MAXIOV; > > vq->descs = kmalloc_array(vq->max_descs, > > sizeof(*vq->descs), > > GFP_KERNEL); > > @@ -2366,6 +2368,8 @@ static void pop_split_desc(struct vhost_virtqueue *vq) > > --vq->ndescs; > > } > > +#define VHOST_DESC_FLAGS (VRING_DESC_F_INDIRECT | VRING_DESC_F_WRITE | \ > > + VRING_DESC_F_NEXT) > > static int push_split_desc(struct vhost_virtqueue *vq, struct vring_desc > > *desc, u16 id) > > { > > struct vhost_desc *h; > > @@ -2375,7 +2379,7 @@ static int push_split_desc(struct vhost_virtqueue > > *vq, struct vring_desc *desc, > > h = >descs[vq->ndescs++]; > > h->addr = vhost64_to_cpu(vq, desc->addr); > > h->len = vhost32_to_cpu(vq, desc->len); > > - h->flags = vhost16_to_cpu(vq, desc->flags); > > + h->flags = vhost16_to_cpu(vq, desc->flags) & VHOST_DESC_FLAGS; > > h->id = id; > > return 0; > > @@ -2450,7 +2454,7 @@ static int fetch_indirect_descs(struct > > vhost_virtqueue *vq, > > return 0; > > } > > -static int fetch_descs(struct vhost_virtqueue *vq) > > +static int fetch_buf(struct vhost_virtqueue *vq) > > { > > struct vring_desc desc; > > unsigned int i, head, found = 0; > > @@ -2462,7 +2466,11 @@ static int fetch_descs(struct vhost_virtqueue *vq) > > /* Check it isn't doing very strange things with descriptor numbers. */ > > last_avail_idx = vq->last_avail_idx; > > - if (vq->avail_idx == vq->last_avail_idx) { > > + if (unlikely(vq->avail_idx == vq->last_avail_idx)) { > > + /* If we already have work to do, don't bother re-checking. */ > > + if (likely(vq->ndescs)) > > + return vq->num; > > + > > if (unlikely(vhost_get_avail_idx(vq, _idx))) { > > vq_err(vq, "Failed to access avail idx at %p\n", > > >avail->idx); > > @@ -2541,6 +2549,24 @@ static int fetch_descs(struct vhost_virtqueue *vq) > > return 0; > > } > > +static int fetch_descs(struct vhost_virtqueue *vq) > > +{ > > + int ret = 0; > > + > > + if (unlikely(vq->first_desc >= vq->ndescs)) { > > + vq->first_desc = 0; > > + vq->ndescs = 0; > > + } > > + > > + if (vq->ndescs) > > + return 0; > > + > > + while (!ret && vq->ndescs <= vq->batch_descs) > > + ret = fetch_buf(vq); > > > It looks to me descriptor chaining might be broken here. It should work because fetch_buf fetches a whole buf, following the chain. Seems to work in a small test ... what issues do you see? > > > + > > + return vq->ndescs ? 0 : ret; > > +} > > + > > /* This looks in the virtqueue and for the first available buffer, and > > converts > >* it to an iovec for convenient access. Since descriptors consist of > > some > >* number of output then some number of input descriptors, it's actually > > two > > @@ -2562,6 +2588,8 @@ int vhost_get_vq_desc_batch(struct vhost_virtqueue > > *vq, > > if (ret) > > return ret; > > + /* Note: indirect
Re: [PATCH v3 2/2] iio: (bma400) add driver for the BMA400
On 10/12/19 12:25 PM, Dan Robertson wrote: > Add a IIO driver for the Bosch BMA400 3-axes ultra-low power accelerometer. > The driver supports reading from the acceleration and temperature > registers. The driver also supports reading and configuring the output data > rate, oversampling ratio, and scale. > > Signed-off-by: Dan Robertson > --- > drivers/iio/accel/Kconfig | 18 + > drivers/iio/accel/Makefile | 2 + > drivers/iio/accel/bma400.h | 80 > drivers/iio/accel/bma400_core.c | 788 > drivers/iio/accel/bma400_i2c.c | 60 +++ > 5 files changed, 948 insertions(+) > create mode 100644 drivers/iio/accel/bma400.h > create mode 100644 drivers/iio/accel/bma400_core.c > create mode 100644 drivers/iio/accel/bma400_i2c.c No comment other than thank you for ignoring my previous comments. :( -- ~Randy
[GIT PULL] mtd: Fixes for v5.4-rc3
Linus, The following changes since commit 54ecb8f7028c5eb3d740bb82b0f1d90f2df63c5c: Linux 5.4-rc1 (2019-09-30 10:35:40 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git tags/fixes-for-5.4-rc3 for you to fetch changes up to df8fed831cbcdce7b283b2d9c1aadadcf8940d05: mtd: rawnand: au1550nd: Fix au_read_buf16() prototype (2019-10-07 09:56:36 +0200) This pull request contains two fixes for MTD: - spi-nor: Fix for a regression in write_sr() - rawnand: Regression fix for the au1550nd driver Paul Burton (1): mtd: rawnand: au1550nd: Fix au_read_buf16() prototype Tudor Ambarus (1): mtd: spi-nor: Fix direction of the write_sr() transfer drivers/mtd/nand/raw/au1550nd.c | 5 ++--- drivers/mtd/spi-nor/spi-nor.c | 2 +- 2 files changed, 3 insertions(+), 4 deletions(-)
Re: [PATCH RFC v1 1/2] vhost: option to fetch descriptors through an independent struct
On Sat, Oct 12, 2019 at 03:28:49PM +0800, Jason Wang wrote: > > On 2019/10/11 下午9:45, Michael S. Tsirkin wrote: > > The idea is to support multiple ring formats by converting > > to a format-independent array of descriptors. > > > > This costs extra cycles, but we gain in ability > > to fetch a batch of descriptors in one go, which > > is good for code cache locality. > > > > To simplify benchmarking, I kept the old code > > around so one can switch back and forth by > > writing into a module parameter. > > This will go away in the final submission. > > > > This patch causes a minor performance degradation, > > it's been kept as simple as possible for ease of review. > > Next patch gets us back the performance by adding batching. > > > > Signed-off-by: Michael S. Tsirkin > > --- > > drivers/vhost/test.c | 17 ++- > > drivers/vhost/vhost.c | 299 +- > > drivers/vhost/vhost.h | 16 +++ > > 3 files changed, 327 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c > > index 056308008288..39a018a7af2d 100644 > > --- a/drivers/vhost/test.c > > +++ b/drivers/vhost/test.c > > @@ -18,6 +18,9 @@ > > #include "test.h" > > #include "vhost.h" > > +static int newcode = 0; > > +module_param(newcode, int, 0644); > > + > > /* Max number of bytes transferred before requeueing the job. > >* Using this limit prevents one virtqueue from starving others. */ > > #define VHOST_TEST_WEIGHT 0x8 > > @@ -58,10 +61,16 @@ static void handle_vq(struct vhost_test *n) > > vhost_disable_notify(>dev, vq); > > for (;;) { > > - head = vhost_get_vq_desc(vq, vq->iov, > > -ARRAY_SIZE(vq->iov), > > -, , > > -NULL, NULL); > > + if (newcode) > > + head = vhost_get_vq_desc_batch(vq, vq->iov, > > + ARRAY_SIZE(vq->iov), > > + , , > > + NULL, NULL); > > + else > > + head = vhost_get_vq_desc(vq, vq->iov, > > +ARRAY_SIZE(vq->iov), > > +, , > > +NULL, NULL); > > /* On error, stop handling until the next kick. */ > > if (unlikely(head < 0)) > > break; > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > index 36ca2cf419bf..36661d6cb51f 100644 > > --- a/drivers/vhost/vhost.c > > +++ b/drivers/vhost/vhost.c > > @@ -301,6 +301,7 @@ static void vhost_vq_reset(struct vhost_dev *dev, > >struct vhost_virtqueue *vq) > > { > > vq->num = 1; > > + vq->ndescs = 0; > > vq->desc = NULL; > > vq->avail = NULL; > > vq->used = NULL; > > @@ -369,6 +370,9 @@ static int vhost_worker(void *data) > > static void vhost_vq_free_iovecs(struct vhost_virtqueue *vq) > > { > > + kfree(vq->descs); > > + vq->descs = NULL; > > + vq->max_descs = 0; > > kfree(vq->indirect); > > vq->indirect = NULL; > > kfree(vq->log); > > @@ -385,6 +389,10 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev > > *dev) > > for (i = 0; i < dev->nvqs; ++i) { > > vq = dev->vqs[i]; > > + vq->max_descs = dev->iov_limit; > > + vq->descs = kmalloc_array(vq->max_descs, > > + sizeof(*vq->descs), > > + GFP_KERNEL); > > > Is iov_limit too much here? It can obviously increase the footprint. I guess > the batching can only be done for descriptor without indirect or next set. > Then we may batch 16 or 64. > > Thanks Yes, next patch only batches up to 64. But we do need iov_limit because guest can pass a long chain of scatter/gather. We already have iovecs in a huge array so this does not look like a big deal. If we ever teach the code to avoid the huge iov arrays by handling huge s/g lists piece by piece, we can make the desc array smaller at the same point.
[GIT PULL] RISC-V updates for v5.4-rc3
Linus, The following changes since commit da0c9ea146cbe92b832f1b0f694840ea8eb33cce: Linux 5.4-rc2 (2019-10-06 14:27:30 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git tags/riscv/for-v5.4-rc3 for you to fetch changes up to cd9e72b80090a8cd7d84a47a30a06fa92ff277d1: RISC-V: entry: Remove unneeded need_resched() loop (2019-10-09 16:48:27 -0700) RISC-V updates for v5.4-rc3 Some RISC-V fixes for v5.4-rc3: - Fix several bugs in the breakpoint trap handler - Drop an unnecessary loop around calls to preempt_schedule_irq() Valentin Schneider (1): RISC-V: entry: Remove unneeded need_resched() loop Vincent Chen (3): riscv: avoid kernel hangs when trapped in BUG() riscv: avoid sending a SIGTRAP to a user thread trapped in WARN() riscv: Correct the handling of unexpected ebreak in do_trap_break() arch/riscv/kernel/entry.S | 3 +-- arch/riscv/kernel/traps.c | 14 +++--- 2 files changed, 8 insertions(+), 9 deletions(-)
[PATCH] arm64: dts: sun50i: sopine-baseboard: Expose serial1, serial2 and serial3
Follow what the sun50i-a64-pine64.dts does and expose all 5 serial connections. Signed-off-by: Alistair Francis --- .../allwinner/sun50i-a64-sopine-baseboard.dts | 25 +++ 1 file changed, 25 insertions(+) diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64-sopine-baseboard.dts b/arch/arm64/boot/dts/allwinner/sun50i-a64-sopine-baseboard.dts index 124b0b030b28..49c37b21ab36 100644 --- a/arch/arm64/boot/dts/allwinner/sun50i-a64-sopine-baseboard.dts +++ b/arch/arm64/boot/dts/allwinner/sun50i-a64-sopine-baseboard.dts @@ -56,6 +56,10 @@ aliases { ethernet0 = serial0 = + serial1 = + serial2 = + serial3 = + serial4 = }; chosen { @@ -280,6 +284,27 @@ }; }; +/* On Pi-2 connector */ + { + pinctrl-names = "default"; + pinctrl-0 = <_pins>; + status = "disabled"; +}; + +/* On Euler connector */ + { + pinctrl-names = "default"; + pinctrl-0 = <_pins>; + status = "disabled"; +}; + +/* On Euler connector, RTS/CTS optional */ + { + pinctrl-names = "default"; + pinctrl-0 = <_pins>; + status = "disabled"; +}; + _otg { dr_mode = "host"; status = "okay"; -- 2.23.0
[PATCH v3 1/2] dt-bindings: iio: accel: bma400: add bindings
Add devicetree binding for the Bosch BMA400 3-axes ultra-low power accelerometer sensor. Signed-off-by: Dan Robertson --- .../devicetree/bindings/iio/accel/bma400.yaml | 39 +++ 1 file changed, 39 insertions(+) create mode 100644 Documentation/devicetree/bindings/iio/accel/bma400.yaml diff --git a/Documentation/devicetree/bindings/iio/accel/bma400.yaml b/Documentation/devicetree/bindings/iio/accel/bma400.yaml new file mode 100644 index ..31dceac89ace --- /dev/null +++ b/Documentation/devicetree/bindings/iio/accel/bma400.yaml @@ -0,0 +1,39 @@ +# SPDX-License-Identifier: GPL-2.0 +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/iio/accel/bma400.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Bosch BMA400 triaxial acceleration sensor + +maintainers: + - Dan Robertson + +description: | + Acceleration and temerature iio sensors with an i2c interface + + Specifications about the sensor can be found at: + https://ae-bst.resource.bosch.com/media/_tech/media/datasheets/BST-BMA400-DS000.pdf + +properties: + compatible: +enum: + - bosch,bma400 + + reg: +maxItems: 1 + +required: + - compatible + - reg + +examples: + - | +i2c0 { + #address-cells = <1>; + #size-cells = <1>; + bma400@14 { +compatible = "bosch,bma400"; +reg = <0x14>; + }; +}; -- 2.23.0
[PATCH v3 2/2] iio: (bma400) add driver for the BMA400
Add a IIO driver for the Bosch BMA400 3-axes ultra-low power accelerometer. The driver supports reading from the acceleration and temperature registers. The driver also supports reading and configuring the output data rate, oversampling ratio, and scale. Signed-off-by: Dan Robertson --- drivers/iio/accel/Kconfig | 18 + drivers/iio/accel/Makefile | 2 + drivers/iio/accel/bma400.h | 80 drivers/iio/accel/bma400_core.c | 788 drivers/iio/accel/bma400_i2c.c | 60 +++ 5 files changed, 948 insertions(+) create mode 100644 drivers/iio/accel/bma400.h create mode 100644 drivers/iio/accel/bma400_core.c create mode 100644 drivers/iio/accel/bma400_i2c.c diff --git a/drivers/iio/accel/Kconfig b/drivers/iio/accel/Kconfig index 9b9656ce37e6..a1081b902d16 100644 --- a/drivers/iio/accel/Kconfig +++ b/drivers/iio/accel/Kconfig @@ -112,6 +112,24 @@ config BMA220 To compile this driver as a module, choose M here: the module will be called bma220_spi. +config BMA400 + tristate "Bosch BMA400 3-Axis Accelerometer Driver" + depends on I2C + select REGMAP + select BMA400_I2C if (I2C) + help + Say Y here if you want to build a driver for the Bosch BMA400 + triaxial acceleration sensor. + + To compile this driver as a module, choose M here: the + module will be called bma400_core and you will also get + bma400_i2c for I2C. + +config BMA400_I2C + tristate + depends on BMA400 + select REGMAP_I2C + config BMC150_ACCEL tristate "Bosch BMC150 Accelerometer Driver" select IIO_BUFFER diff --git a/drivers/iio/accel/Makefile b/drivers/iio/accel/Makefile index 56bd0215e0d4..3a051cf37f40 100644 --- a/drivers/iio/accel/Makefile +++ b/drivers/iio/accel/Makefile @@ -14,6 +14,8 @@ obj-$(CONFIG_ADXL372_I2C) += adxl372_i2c.o obj-$(CONFIG_ADXL372_SPI) += adxl372_spi.o obj-$(CONFIG_BMA180) += bma180.o obj-$(CONFIG_BMA220) += bma220_spi.o +obj-$(CONFIG_BMA400) += bma400_core.o +obj-$(CONFIG_BMA400_I2C) += bma400_i2c.o obj-$(CONFIG_BMC150_ACCEL) += bmc150-accel-core.o obj-$(CONFIG_BMC150_ACCEL_I2C) += bmc150-accel-i2c.o obj-$(CONFIG_BMC150_ACCEL_SPI) += bmc150-accel-spi.o diff --git a/drivers/iio/accel/bma400.h b/drivers/iio/accel/bma400.h new file mode 100644 index ..e5fa57d1b97a --- /dev/null +++ b/drivers/iio/accel/bma400.h @@ -0,0 +1,80 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * bma400.h - Register constants and other forward declarations + *needed by the bma400 sources. + * + * Copyright 2019 Dan Robertson + */ + +#include + +/* + * Read-Only Registers + */ + +/* Status and ID registers */ +#define BMA400_CHIP_ID_REG 0x00 +#define BMA400_ERR_REG 0x02 +#define BMA400_STATUS_REG 0x03 + +/* Acceleration registers */ +#define BMA400_X_AXIS_LSB_REG 0x04 +#define BMA400_X_AXIS_MSB_REG 0x05 +#define BMA400_Y_AXIS_LSB_REG 0x06 +#define BMA400_Y_AXIS_MSB_REG 0x07 +#define BMA400_Z_AXIS_LSB_REG 0x08 +#define BMA400_Z_AXIS_MSB_REG 0x09 + +/* Sensor time registers */ +#define BMA400_SENSOR_TIME0 0x0a +#define BMA400_SENSOR_TIME1 0x0b +#define BMA400_SENSOR_TIME2 0x0c + +/* Event and interrupt registers */ +#define BMA400_EVENT_REG0x0d +#define BMA400_INT_STAT0_REG0x0e +#define BMA400_INT_STAT1_REG0x0f +#define BMA400_INT_STAT2_REG0x10 + +/* Temperature register */ +#define BMA400_TEMP_DATA_REG0x11 + +/* FIFO length and data registers */ +#define BMA400_FIFO_LENGTH0_REG 0x12 +#define BMA400_FIFO_LENGTH1_REG 0x13 +#define BMA400_FIFO_DATA_REG0x14 + +/* Step count registers */ +#define BMA400_STEP_CNT0_REG0x15 +#define BMA400_STEP_CNT1_REG0x16 +#define BMA400_STEP_CNT3_REG0x17 +#define BMA400_STEP_STAT_REG0x18 + +/* + * Read-write configuration registers + */ +#define BMA400_ACC_CONFIG0_REG 0x19 +#define BMA400_ACC_CONFIG1_REG 0x1a +#define BMA400_ACC_CONFIG2_REG 0x1b +#define BMA400_CMD_REG 0x7e + +/* Chip ID of BMA 400 devices found in the chip ID register. */ +#define BMA400_ID_REG_VAL 0x90 + +#define BMA400_TWO_BITS_MASK0x03 +#define BMA400_LP_OSR_MASK 0x60 +#define BMA400_NP_OSR_MASK 0x30 +#define BMA400_ACC_ODR_MASK 0x0f +#define BMA400_ACC_SCALE_MASK 0xc0 + +#define BMA400_LP_OSR_SHIFT 0x05 +#define BMA400_NP_OSR_SHIFT 0x04 +#define BMA400_SCALE_SHIFT 0x06 + +extern const struct regmap_config bma400_regmap_config; + +int bma400_probe(struct device *dev, +struct regmap *regmap, +const char *name); + +int bma400_remove(struct device *dev); diff --git a/drivers/iio/accel/bma400_core.c b/drivers/iio/accel/bma400_core.c new file mode 100644 index ..1b19e69686ad --- /dev/null +++ b/drivers/iio/accel/bma400_core.c @@ -0,0 +1,788 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * bma400_core.c - Core IIO driver for Bosch BMA400 triaxial
[PATCH v3 0/2] iio: add driver for Bosch BMA400 accelerometer
This patchset adds a IIO driver for the Bosch BMA400 3-axes ultra low-power accelerometer. The initial implementation of the driver adds read support for the acceleration and temperature data registers. The driver also has support for reading and writing to the output data rate, oversampling ratio, and scale configuration registers. Version 3 implements the feedback from reviewers of the v2 patchset. Cheers, - Dan Changes in v3: * Use yaml format for DT bindings * Remove strict dependency on OF * Tidy Kconfig dependencies * Stylistic changes * Do not soft-reset device on remove Changes in v2: * Implemented iio_info -> read_avail * Stylistic changes * Implemented devicetree bindings Dan Robertson (2): dt-bindings: iio: accel: bma400: add bindings iio: (bma400) add driver for the BMA400 .../devicetree/bindings/iio/accel/bma400.yaml | 39 + drivers/iio/accel/Kconfig | 18 + drivers/iio/accel/Makefile| 2 + drivers/iio/accel/bma400.h| 80 ++ drivers/iio/accel/bma400_core.c | 788 ++ drivers/iio/accel/bma400_i2c.c| 60 ++ 6 files changed, 987 insertions(+) create mode 100644 Documentation/devicetree/bindings/iio/accel/bma400.yaml create mode 100644 drivers/iio/accel/bma400.h create mode 100644 drivers/iio/accel/bma400_core.c create mode 100644 drivers/iio/accel/bma400_i2c.c -- 2.23.0
Re: [PATCH net] net: sched: act_mirred: drop skb's dst_entry in ingress redirection
Hello! On 10/12/2019 10:16 AM, Zhiyuan Hou wrote: > In act_mirred's ingress redirection, if the skb's dst_entry is valid > when call function netif_receive_skb, the fllowing l3 stack process Following or flowing? > (ip_rcv_finish_core) will check dst_entry and skip the routing > decision. Using the old dst_entry is unexpected and may discard the > skb in some case. For example dst->dst_input points to dst_discard. > > This patch drops the skb's dst_entry before calling netif_receive_skb > so that the skb can be made routing decision like a normal ingress > skb. > > Signed-off-by: Zhiyuan Hou > --- > net/sched/act_mirred.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c > index 9ce073a05414..6108a64c0cd5 100644 > --- a/net/sched/act_mirred.c > +++ b/net/sched/act_mirred.c [...] > @@ -298,8 +299,10 @@ static int tcf_mirred_act(struct sk_buff *skb, const > struct tc_action *a, > > if (!want_ingress) > err = dev_queue_xmit(skb2); > - else > + else { > + skb_dst_drop(skb2); > err = netif_receive_skb(skb2); > + } If you introduce {} in one *if* branch, {} should be added to the other branches as well, says CodingStyle. [...] MBR, Sergei
Re: [PATCH RFC v1 0/2] vhost: ring format independence
On Sat, Oct 12, 2019 at 04:15:42PM +0800, Jason Wang wrote: > > On 2019/10/11 下午9:45, Michael S. Tsirkin wrote: > > So the idea is as follows: we convert descriptors to an > > independent format first, and process that converting to > > iov later. > > > > The point is that we have a tight loop that fetches > > descriptors, which is good for cache utilization. > > This will also allow all kind of batching tricks - > > e.g. it seems possible to keep SMAP disabled while > > we are fetching multiple descriptors. > > > I wonder this may help for performance: Could you try it out and report please? Would be very much appreciated. > - another indirection layer, increased footprint Seems to be offset off by improved batching. For sure will be even better if we can move stac/clac out, or replace some get/put user with bigger copy to/from. > - won't help or even degrade when there's no batch I couldn't measure a difference. I'm guessing > - an extra overhead in the case of in order where we should already had > tight loop it's not so tight with translation in there. this exactly makes the loop tight. > - need carefully deal with indirect and chain or make it only work for > packet sit just in a single descriptor > > Thanks I don't understand this last comment. > > > > > And perhaps more importantly, this is a very good fit for the packed > > ring layout, where we get and put descriptors in order. > > > > This patchset seems to already perform exactly the same as the original > > code already based on a microbenchmark. More testing would be very much > > appreciated. > > > > Biggest TODO before this first step is ready to go in is to > > batch indirect descriptors as well. > > > > Integrating into vhost-net is basically > > s/vhost_get_vq_desc/vhost_get_vq_desc_batch/ - > > or add a module parameter like I did in the test module. > > > > > > > > Michael S. Tsirkin (2): > >vhost: option to fetch descriptors through an independent struct > >vhost: batching fetches > > > > drivers/vhost/test.c | 19 ++- > > drivers/vhost/vhost.c | 333 +- > > drivers/vhost/vhost.h | 20 ++- > > 3 files changed, 365 insertions(+), 7 deletions(-) > >
GOOD DAY?
Western Associate Bank Bank Address:Tower Building 83 Hull Road Oxwich Brussels Belgium Dear Friend Please accept my apologies if this request does not meet your personal ethics as it is not intended to cause you any embarrassment in what ever form. I got your contact email address from the internet directory and decided to contact you for this transaction that is based on trust and your outstanding. I have an interesting business proposal for you that will be of immense benefit to both of us. Although this may be hard for you to believe because i know that there is absolutely going to be a great doubt and distrust in your heart in respect of this email as this might sound strange to you and coupled with the fact that, so many individuals have taken possession of the Internet to facilitate their nefarious deeds, thereby making it extremely difficult for genuine and legitimate persons to get attention and recognition. Please grant me the benefit of doubt and hear me out. My name is Henk Boelens . I work with Western Associate Bank here in Belgium as a branch bank manager. I discovered an abandoned sum of GBP 19,850,000.00 (Nineteen Million Eight Hundred And Fifty Thousand British Pounds) in an account that belongs to one of our foreign customers Late Dr. Erin Jacobson, an American citizen who unfortunately lost his life and his entire family in Montana plane crash on March 23, 2009, on their way to a group ski vacation. The choice of contacting you is aroused from the geographical nature of where you live, particularly due to the sensitivity of this transaction and the confidentiality herein. Now our bank has been waiting for any of the relatives to come up for the claim but nobody has done that. I personally tried to locate any member of his family but have been unsuccessful in locating the relatives for 7 years now, i have also checked the deposit documents and discovered that he did not declare any next of kin on the deposit. Now the Management of our bank as instructed me to look for the next of kin or they will convert the funds into the Government Treasury Account as unclaimed funds and the funds will be wasted. Therefore, I cannot claim these funds without presenting a foreigner to stand as next of kin. This is reason why I contacted you to seek your consent to present you as an next of kin so that the funds will be release to you, then we share it 55% for me and 45% for you because am not a greedy person and is deal between me and you. I have employed the service of an Attorney who will secure all necessary legal documents that could be used to back up this claim. All the attorney need to do is to fill in your names to the documents and legalize it in the Court here to prove you as the legitimate next of kin to the late depositor Dr. Erin Jacobson then the bank will release the funds to you as the rightful beneficiary. This is a fair deal without any risk attached either on your part or on my part as long as we comply with the laws governing the claiming of funds in our establishment. All I require is your honest co-operation to enable us see this deal through, and with my position in the bank as a bank manager, I will do every thing possible to protect your interest and to make sure everything workout successfully. If you are interested in this deal, kindly send me your complete information, your full names and address, Your Private telephone and Fax numbers, and Cell phone so that the attorney will start processing the necessary paperwork that would facilitate the release of the funds to you. Mr. Henk Boelens
[PATCH RFC v2 1/2] vhost: option to fetch descriptors through an independent struct
The idea is to support multiple ring formats by converting to a format-independent array of descriptors. This costs extra cycles, but we gain in ability to fetch a batch of descriptors in one go, which is good for code cache locality. To simplify benchmarking, I kept the old code around so one can switch back and forth by writing into a module parameter. This will go away in the final submission. This patch causes a minor performance degradation, it's been kept as simple as possible for ease of review. Next patch gets us back the performance by adding batching. Signed-off-by: Michael S. Tsirkin --- drivers/vhost/test.c | 17 ++- drivers/vhost/vhost.c | 299 +- drivers/vhost/vhost.h | 16 +++ 3 files changed, 327 insertions(+), 5 deletions(-) diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c index 056308008288..39a018a7af2d 100644 --- a/drivers/vhost/test.c +++ b/drivers/vhost/test.c @@ -18,6 +18,9 @@ #include "test.h" #include "vhost.h" +static int newcode = 0; +module_param(newcode, int, 0644); + /* Max number of bytes transferred before requeueing the job. * Using this limit prevents one virtqueue from starving others. */ #define VHOST_TEST_WEIGHT 0x8 @@ -58,10 +61,16 @@ static void handle_vq(struct vhost_test *n) vhost_disable_notify(>dev, vq); for (;;) { - head = vhost_get_vq_desc(vq, vq->iov, -ARRAY_SIZE(vq->iov), -, , -NULL, NULL); + if (newcode) + head = vhost_get_vq_desc_batch(vq, vq->iov, + ARRAY_SIZE(vq->iov), + , , + NULL, NULL); + else + head = vhost_get_vq_desc(vq, vq->iov, +ARRAY_SIZE(vq->iov), +, , +NULL, NULL); /* On error, stop handling until the next kick. */ if (unlikely(head < 0)) break; diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 36ca2cf419bf..36661d6cb51f 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -301,6 +301,7 @@ static void vhost_vq_reset(struct vhost_dev *dev, struct vhost_virtqueue *vq) { vq->num = 1; + vq->ndescs = 0; vq->desc = NULL; vq->avail = NULL; vq->used = NULL; @@ -369,6 +370,9 @@ static int vhost_worker(void *data) static void vhost_vq_free_iovecs(struct vhost_virtqueue *vq) { + kfree(vq->descs); + vq->descs = NULL; + vq->max_descs = 0; kfree(vq->indirect); vq->indirect = NULL; kfree(vq->log); @@ -385,6 +389,10 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev) for (i = 0; i < dev->nvqs; ++i) { vq = dev->vqs[i]; + vq->max_descs = dev->iov_limit; + vq->descs = kmalloc_array(vq->max_descs, + sizeof(*vq->descs), + GFP_KERNEL); vq->indirect = kmalloc_array(UIO_MAXIOV, sizeof(*vq->indirect), GFP_KERNEL); @@ -392,7 +400,7 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev) GFP_KERNEL); vq->heads = kmalloc_array(dev->iov_limit, sizeof(*vq->heads), GFP_KERNEL); - if (!vq->indirect || !vq->log || !vq->heads) + if (!vq->indirect || !vq->log || !vq->heads || !vq->descs) goto err_nomem; } return 0; @@ -2346,6 +2354,295 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq, } EXPORT_SYMBOL_GPL(vhost_get_vq_desc); +static struct vhost_desc *peek_split_desc(struct vhost_virtqueue *vq) +{ + BUG_ON(!vq->ndescs); + return >descs[vq->ndescs - 1]; +} + +static void pop_split_desc(struct vhost_virtqueue *vq) +{ + BUG_ON(!vq->ndescs); + --vq->ndescs; +} + +static int push_split_desc(struct vhost_virtqueue *vq, struct vring_desc *desc, u16 id) +{ + struct vhost_desc *h; + + if (unlikely(vq->ndescs >= vq->max_descs)) + return -EINVAL; + h = >descs[vq->ndescs++]; + h->addr = vhost64_to_cpu(vq, desc->addr); + h->len = vhost32_to_cpu(vq, desc->len); + h->flags = vhost16_to_cpu(vq, desc->flags); + h->id = id; + + return 0; +} + +static int fetch_indirect_descs(struct vhost_virtqueue *vq, + struct vhost_desc *indirect, + u16 head) +{ +
[PATCH RFC v2 2/2] vhost: batching fetches
With this patch applied, new and old code perform identically. Lots of extra optimizations are now possible, e.g. we can fetch multiple heads with copy_from/to_user now. We can get rid of maintaining the log array. Etc etc. Signed-off-by: Michael S. Tsirkin --- drivers/vhost/test.c | 2 +- drivers/vhost/vhost.c | 50 --- drivers/vhost/vhost.h | 4 +++- 3 files changed, 46 insertions(+), 10 deletions(-) diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c index 39a018a7af2d..e3a8e9db22cd 100644 --- a/drivers/vhost/test.c +++ b/drivers/vhost/test.c @@ -128,7 +128,7 @@ static int vhost_test_open(struct inode *inode, struct file *f) dev = >dev; vqs[VHOST_TEST_VQ] = >vqs[VHOST_TEST_VQ]; n->vqs[VHOST_TEST_VQ].handle_kick = handle_vq_kick; - vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV, + vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV + 64, VHOST_TEST_PKT_WEIGHT, VHOST_TEST_WEIGHT); f->private_data = n; diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 36661d6cb51f..50d4a148d60d 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -302,6 +302,7 @@ static void vhost_vq_reset(struct vhost_dev *dev, { vq->num = 1; vq->ndescs = 0; + vq->first_desc = 0; vq->desc = NULL; vq->avail = NULL; vq->used = NULL; @@ -390,6 +391,7 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev) for (i = 0; i < dev->nvqs; ++i) { vq = dev->vqs[i]; vq->max_descs = dev->iov_limit; + vq->batch_descs = dev->iov_limit - UIO_MAXIOV; vq->descs = kmalloc_array(vq->max_descs, sizeof(*vq->descs), GFP_KERNEL); @@ -2366,6 +2368,8 @@ static void pop_split_desc(struct vhost_virtqueue *vq) --vq->ndescs; } +#define VHOST_DESC_FLAGS (VRING_DESC_F_INDIRECT | VRING_DESC_F_WRITE | \ + VRING_DESC_F_NEXT) static int push_split_desc(struct vhost_virtqueue *vq, struct vring_desc *desc, u16 id) { struct vhost_desc *h; @@ -2375,7 +2379,7 @@ static int push_split_desc(struct vhost_virtqueue *vq, struct vring_desc *desc, h = >descs[vq->ndescs++]; h->addr = vhost64_to_cpu(vq, desc->addr); h->len = vhost32_to_cpu(vq, desc->len); - h->flags = vhost16_to_cpu(vq, desc->flags); + h->flags = vhost16_to_cpu(vq, desc->flags) & VHOST_DESC_FLAGS; h->id = id; return 0; @@ -2450,7 +2454,7 @@ static int fetch_indirect_descs(struct vhost_virtqueue *vq, return 0; } -static int fetch_descs(struct vhost_virtqueue *vq) +static int fetch_buf(struct vhost_virtqueue *vq) { struct vring_desc desc; unsigned int i, head, found = 0; @@ -2462,7 +2466,11 @@ static int fetch_descs(struct vhost_virtqueue *vq) /* Check it isn't doing very strange things with descriptor numbers. */ last_avail_idx = vq->last_avail_idx; - if (vq->avail_idx == vq->last_avail_idx) { + if (unlikely(vq->avail_idx == vq->last_avail_idx)) { + /* If we already have work to do, don't bother re-checking. */ + if (likely(vq->ndescs)) + return vq->num; + if (unlikely(vhost_get_avail_idx(vq, _idx))) { vq_err(vq, "Failed to access avail idx at %p\n", >avail->idx); @@ -2541,6 +2549,24 @@ static int fetch_descs(struct vhost_virtqueue *vq) return 0; } +static int fetch_descs(struct vhost_virtqueue *vq) +{ + int ret = 0; + + if (unlikely(vq->first_desc >= vq->ndescs)) { + vq->first_desc = 0; + vq->ndescs = 0; + } + + if (vq->ndescs) + return 0; + + while (!ret && vq->ndescs <= vq->batch_descs) + ret = fetch_buf(vq); + + return vq->ndescs ? 0 : ret; +} + /* This looks in the virtqueue and for the first available buffer, and converts * it to an iovec for convenient access. Since descriptors consist of some * number of output then some number of input descriptors, it's actually two @@ -2562,6 +2588,8 @@ int vhost_get_vq_desc_batch(struct vhost_virtqueue *vq, if (ret) return ret; + /* Note: indirect descriptors are not batched */ + /* TODO: batch up to a limit */ last = peek_split_desc(vq); id = last->id; @@ -2584,12 +2612,12 @@ int vhost_get_vq_desc_batch(struct vhost_virtqueue *vq, if (unlikely(log)) *log_num = 0; - for (i = 0; i < vq->ndescs; ++i) { + for (i = vq->first_desc; i < vq->ndescs; ++i) { unsigned iov_count = *in_num + *out_num; struct vhost_desc *desc = >descs[i]; int access; - if (desc->flags &
[PATCH RFC v2 0/2] vhost: ring format independence
This adds infrastructure required for supporting multiple ring formats. The idea is as follows: we convert descriptors to an independent format first, and process that converting to iov later. The point is that we have a tight loop that fetches descriptors, which is good for cache utilization. This will also allow all kind of batching tricks - e.g. it seems possible to keep SMAP disabled while we are fetching multiple descriptors. This seems to perform exactly the same as the original code already based on a microbenchmark. More testing would be very much appreciated. Biggest TODO before this first step is ready to go in is to batch indirect descriptors as well. Integrating into vhost-net is basically s/vhost_get_vq_desc/vhost_get_vq_desc_batch/ - or add a module parameter like I did in the test module. Changes from v1: - typo fixes Michael S. Tsirkin (2): vhost: option to fetch descriptors through an independent struct vhost: batching fetches drivers/vhost/test.c | 19 ++- drivers/vhost/vhost.c | 333 +- drivers/vhost/vhost.h | 20 ++- 3 files changed, 365 insertions(+), 7 deletions(-) -- MST
[PATCH 7/7] Add a new sysctl for limiting userfaultfd to user mode faults
Add a new sysctl knob unprivileged_userfaultfd_user_mode_only. This sysctl can be set to either zero or one. When zero (the default) the system lets all users call userfaultfd with or without UFFD_USER_MODE_ONLY, modulo other access controls. When unprivileged_userfaultfd_user_mode_only is set to one, users without CAP_SYS_PTRACE must pass UFFD_USER_MODE_ONLY to userfaultfd or the API will fail with EPERM. This facility allows administrators to reduce the likelihood that an attacker with access to userfaultfd can delay faulting kernel code to widen timing windows for other exploits. Signed-off-by: Daniel Colascione --- Documentation/admin-guide/sysctl/vm.rst | 13 + fs/userfaultfd.c| 12 ++-- include/linux/userfaultfd_k.h | 1 + kernel/sysctl.c | 9 + 4 files changed, 33 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 6664eec7bd35..330fd82b3f4e 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -849,6 +849,19 @@ they pass the UFFD_SECURE, enabling MAC security checks. The default value is 1. +unprivileged_userfaultfd_user_mode_only + + +This flag controls whether unprivileged users can use the userfaultfd +system calls to handle page faults in kernel mode. If set to zero, +userfaultfd works with or without UFFD_USER_MODE_ONLY, modulo +unprivileged_userfaultfd above. If set to one, users without +SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd +to succeed. Prohibiting use of userfaultfd for handling faults from +kernel mode may make certain vulnerabilities more difficult +to exploit. + +The default value is 0. user_reserve_kbytes === diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index aaed9347973e..02addd425ab7 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -29,6 +29,7 @@ #include int sysctl_unprivileged_userfaultfd __read_mostly = 1; +int sysctl_unprivileged_userfaultfd_user_mode_only __read_mostly = 0; static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly; @@ -1963,8 +1964,15 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) struct userfaultfd_ctx *ctx; int fd; static const int uffd_flags = UFFD_SECURE | UFFD_USER_MODE_ONLY; - bool need_cap_check = sysctl_unprivileged_userfaultfd == 0 || - (sysctl_unprivileged_userfaultfd == 2 && !(flags & UFFD_SECURE)); + bool need_cap_check = false; + + if (sysctl_unprivileged_userfaultfd == 0 || + (sysctl_unprivileged_userfaultfd == 2 && !(flags & UFFD_SECURE))) + need_cap_check = true; + + if (sysctl_unprivileged_userfaultfd_user_mode_only && + (flags & UFFD_USER_MODE_ONLY) == 0) + need_cap_check = true; if (need_cap_check && !capable(CAP_SYS_PTRACE)) return -EPERM; diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 549c8b0cca52..efe14abb2dc8 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -29,6 +29,7 @@ #define UFFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS) extern int sysctl_unprivileged_userfaultfd; +extern int sysctl_unprivileged_userfaultfd_user_mode_only; extern const struct file_operations userfaultfd_fops; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index fc98d5df344e..4f296676c0ac 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1740,6 +1740,15 @@ static struct ctl_table vm_table[] = { .extra1 = SYSCTL_ZERO, .extra2 = , }, + { + .procname = "unprivileged_userfaultfd_user_mode_only", + .data = _unprivileged_userfaultfd_user_mode_only, + .maxlen = sizeof(sysctl_unprivileged_userfaultfd_user_mode_only), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, #endif { } }; -- 2.23.0.700.g56cf767bdb-goog
Re: [PATCH 1/2] drm/imx: Fix error handling for a kmemdup() call in imx_pd_bind()
On Sat, Oct 12, 2019 at 4:07 AM Markus Elfring wrote: > > From: Markus Elfring > Date: Sat, 12 Oct 2019 10:30:21 +0200 > > The return value from a call of the function “kmemdup” was not checked > in this function implementation. Thus add the corresponding error handling. > > Fixes: 19022aaae677dfa171a719e9d1ff04823ce65a65 ("staging: drm/imx: Add > parallel display support") > Signed-off-by: Markus Elfring > --- > drivers/gpu/drm/imx/parallel-display.c | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/imx/parallel-display.c > b/drivers/gpu/drm/imx/parallel-display.c > index 35518e5de356..39c4798f56b6 100644 > --- a/drivers/gpu/drm/imx/parallel-display.c > +++ b/drivers/gpu/drm/imx/parallel-display.c > @@ -210,8 +210,13 @@ static int imx_pd_bind(struct device *dev, struct device > *master, void *data) > return -ENOMEM; > > edidp = of_get_property(np, "edid", >edid_len); > - if (edidp) > + if (edidp) { > imxpd->edid = kmemdup(edidp, imxpd->edid_len, GFP_KERNEL); > + if (!imxpd->edid) { > + devm_kfree(dev, imxpd); You should not try to free imxpd here as it is a resource-managed allocation via devm_kzalloc(). It means memory allocated with this function is automatically freed on driver detach. So, this patch introduces a double-free. > + return -ENOMEM; > + } > + } > > ret = of_property_read_string(np, "interface-pix-fmt", ); > if (!ret) { > -- > 2.23.0 > -- Navid.
Re: Linux 5.3.6
> I'm announcing the release of the 5.3.6 kernel. 5.3.6 build fails here with: arch/x86/entry/vdso/vdso64.so.dbg: undefined symbols found CC arch/x86/kernel/cpu/mce/threshold.o make[3]: *** [arch/x86/entry/vdso/Makefile:59: arch/x86/entry/vdso/vdso64.so.dbg] Error 1 make[3]: *** Deleting file 'arch/x86/entry/vdso/vdso64.so.dbg' make[2]: *** [scripts/Makefile.build:497: arch/x86/entry/vdso] Error 2 make[1]: *** [scripts/Makefile.build:497: arch/x86/entry] Error 2 make[1]: *** Waiting for unfinished jobs Chris Clayton
[PATCH 6/7] Allow users to require UFFD_SECURE
This change adds 2 as an allowable value for unprivileged_userfaultfd. (Previously, this sysctl could be either 0 or 1.) When unprivileged_userfaultfd is 2, users with CAP_SYS_PTRACE may create userfaultfd with or without UFFD_SECURE, but users without CAP_SYS_PTRACE must pass UFFD_SECURE to userfaultfd in order for the system call to succeed, effectively forcing them to opt into additional security checks. Signed-off-by: Daniel Colascione --- Documentation/admin-guide/sysctl/vm.rst | 6 -- fs/userfaultfd.c| 4 +++- kernel/sysctl.c | 2 +- 3 files changed, 8 insertions(+), 4 deletions(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 64aeee1009ca..6664eec7bd35 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -842,8 +842,10 @@ unprivileged_userfaultfd This flag controls whether unprivileged users can use the userfaultfd system calls. Set this to 1 to allow unprivileged users to use the -userfaultfd system calls, or set this to 0 to restrict userfaultfd to only -privileged users (with SYS_CAP_PTRACE capability). +userfaultfd system calls, or set this to 0 to restrict userfaultfd to +only privileged users (with SYS_CAP_PTRACE capability). If set to 2, +unprivileged (non-SYS_CAP_PTRACE) users may use userfaultfd only if +they pass the UFFD_SECURE, enabling MAC security checks. The default value is 1. diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 986d23b2cd33..aaed9347973e 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1963,8 +1963,10 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) struct userfaultfd_ctx *ctx; int fd; static const int uffd_flags = UFFD_SECURE | UFFD_USER_MODE_ONLY; + bool need_cap_check = sysctl_unprivileged_userfaultfd == 0 || + (sysctl_unprivileged_userfaultfd == 2 && !(flags & UFFD_SECURE)); - if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE)) + if (need_cap_check && !capable(CAP_SYS_PTRACE)) return -EPERM; BUG_ON(!current->mm); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 00fcea236eba..fc98d5df344e 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1738,7 +1738,7 @@ static struct ctl_table vm_table[] = { .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_ONE, + .extra2 = , }, #endif { } -- 2.23.0.700.g56cf767bdb-goog
[PATCH 4/7] Teach SELinux about a new userfaultfd class
Use the secure anonymous inode LSM hook we just added to let SELinux policy place restrictions on userfaultfd use. The create operation applies to processes creating new instances of these file objects; transfer between processes is covered by restrictions on read, write, and ioctl access already checked inside selinux_file_receive. Signed-off-by: Daniel Colascione --- fs/userfaultfd.c| 4 +- include/linux/userfaultfd_k.h | 2 + security/selinux/hooks.c| 68 + security/selinux/include/classmap.h | 2 + 4 files changed, 73 insertions(+), 3 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 29f920fb236e..1123089c3d55 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1014,8 +1014,6 @@ static __poll_t userfaultfd_poll(struct file *file, poll_table *wait) } } -static const struct file_operations userfaultfd_fops; - static int resolve_userfault_fork(struct userfaultfd_ctx *ctx, struct userfaultfd_ctx *new, struct uffd_msg *msg) @@ -1934,7 +1932,7 @@ static void userfaultfd_show_fdinfo(struct seq_file *m, struct file *f) } #endif -static const struct file_operations userfaultfd_fops = { +const struct file_operations userfaultfd_fops = { #ifdef CONFIG_PROC_FS .show_fdinfo= userfaultfd_show_fdinfo, #endif diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index ac9d71e24b81..549c8b0cca52 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -30,6 +30,8 @@ extern int sysctl_unprivileged_userfaultfd; +extern const struct file_operations userfaultfd_fops; + extern vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason); extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start, diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 9625b99e677f..0b3a36cbfbdc 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -92,6 +92,10 @@ #include #include +#ifdef CONFIG_USERFAULTFD +#include +#endif + #include "avc.h" #include "objsec.h" #include "netif.h" @@ -2943,6 +2947,69 @@ static int selinux_inode_init_security(struct inode *inode, struct inode *dir, return 0; } +static int selinux_inode_init_security_anon(struct inode *inode, + const char *name, + const struct file_operations *fops) +{ + const struct task_security_struct *tsec = selinux_cred(current_cred()); + struct common_audit_data ad; + struct inode_security_struct *isec; + + if (unlikely(IS_PRIVATE(inode))) + return 0; + + /* +* We shouldn't be creating secure anonymous inodes before LSM +* initialization completes. +*/ + if (unlikely(!selinux_state.initialized)) + return -EBUSY; + + isec = selinux_inode(inode); + + /* +* We only get here once per ephemeral inode. The inode has +* been initialized via inode_alloc_security but is otherwise +* untouched, so check that the state is as +* inode_alloc_security left it. +*/ + BUG_ON(isec->initialized != LABEL_INVALID); + BUG_ON(isec->sclass != SECCLASS_FILE); + +#ifdef CONFIG_USERFAULTFD + if (fops == _fops) + isec->sclass = SECCLASS_UFFD; +#endif + + if (isec->sclass == SECCLASS_FILE) { + printk(KERN_WARNING "refusing to create secure anonymous inode " + "of unknown type"); + return -EOPNOTSUPP; + } + /* +* Always give secure anonymous inodes the sid of the +* creating task. +*/ + + isec->sid = tsec->sid; + isec->initialized = LABEL_INITIALIZED; + + /* +* Now that we've initialized security, check whether we're +* allowed to actually create this type of anonymous inode. +*/ + + ad.type = LSM_AUDIT_DATA_INODE; + ad.u.inode = inode; + + return avc_has_perm(_state, + tsec->sid, + isec->sid, + isec->sclass, + FILE__CREATE, + ); +} + static int selinux_inode_create(struct inode *dir, struct dentry *dentry, umode_t mode) { return may_create(dir, dentry, SECCLASS_FILE); @@ -6840,6 +6907,7 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(inode_alloc_security, selinux_inode_alloc_security), LSM_HOOK_INIT(inode_free_security, selinux_inode_free_security), LSM_HOOK_INIT(inode_init_security, selinux_inode_init_security), + LSM_HOOK_INIT(inode_init_security_anon, selinux_inode_init_security_anon), LSM_HOOK_INIT(inode_create, selinux_inode_create),
[PATCH 2/7] Add a concept of a "secure" anonymous file
A secure anonymous file is one we hooked up to its own inode (as opposed to the shared inode we use for non-secure anonymous files). A new selinux hook gives security modules a chance to initialize, label, and veto the creation of these secure anonymous files. Security modules had limit ability to interact with non-secure anonymous files due to all of these files sharing a single inode. Signed-off-by: Daniel Colascione --- fs/anon_inodes.c | 45 ++- include/linux/lsm_hooks.h | 8 +++ include/linux/security.h | 2 ++ security/security.c | 8 +++ 4 files changed, 53 insertions(+), 10 deletions(-) diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c index caa36019afca..d68d76523ad3 100644 --- a/fs/anon_inodes.c +++ b/fs/anon_inodes.c @@ -55,6 +55,23 @@ static struct file_system_type anon_inode_fs_type = { .kill_sb= kill_anon_super, }; +struct inode *anon_inode_make_secure_inode(const char *name, + const struct file_operations *fops) +{ + struct inode *inode; + int error; + inode = alloc_anon_inode(anon_inode_mnt->mnt_sb); + if (IS_ERR(inode)) + return ERR_PTR(PTR_ERR(inode)); + inode->i_flags &= ~S_PRIVATE; + error = security_inode_init_security_anon(inode, name, fops); + if (error) { + iput(inode); + return ERR_PTR(error); + } + return inode; +} + /** * anon_inode_getfile2 - creates a new file instance by hooking it up to * an anonymous inode, and a dentry that describe @@ -72,7 +89,9 @@ static struct file_system_type anon_inode_fs_type = { * hence saving memory and avoiding code duplication for the file/inode/dentry * setup. Returns the newly created file* or an error pointer. * - * anon_inode_flags must be zero. + * If anon_inode_flags contains ANON_INODE_SECURE, create a new inode + * and enable security checks for it. Otherwise, attach a new file to + * a singleton placeholder inode with security checks disabled. */ struct file *anon_inode_getfile2(const char *name, const struct file_operations *fops, @@ -81,17 +100,23 @@ struct file *anon_inode_getfile2(const char *name, struct inode *inode; struct file *file; - if (anon_inode_flags) + if (anon_inode_flags & ~ANON_INODE_SECURE) return ERR_PTR(-EINVAL); - inode = anon_inode_inode; - if (IS_ERR(inode)) - return ERR_PTR(-ENODEV); - /* -* We know the anon_inode inode count is always -* greater than zero, so ihold() is safe. -*/ - ihold(inode); + if (anon_inode_flags & ANON_INODE_SECURE) { + inode = anon_inode_make_secure_inode(name, fops); + if (IS_ERR(inode)) + return ERR_PTR(PTR_ERR(inode)); + } else { + inode = anon_inode_inode; + if (IS_ERR(inode)) + return ERR_PTR(-ENODEV); + /* +* We know the anon_inode inode count is always +* greater than zero, so ihold() is safe. +*/ + ihold(inode); + } if (fops->owner && !try_module_get(fops->owner)) { file = ERR_PTR(-ENOENT); diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index a3763247547c..3744ce9e9172 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -215,6 +215,10 @@ * Returns 0 if @name and @value have been successfully set, * -EOPNOTSUPP if no security attribute is needed, or * -ENOMEM on memory allocation failure. + * @inode_init_security_anon: + * Set up a secure anonymous inode. + * Returns 0 on success. Returns -EPERM if the security module denies + * the creation of this inode. * @inode_create: * Check permission to create a regular file. * @dir contains inode structure of the parent of the new file. @@ -1552,6 +1556,9 @@ union security_list_options { const struct qstr *qstr, const char **name, void **value, size_t *len); + int (*inode_init_security_anon)(struct inode *inode, + const char *name, + const struct file_operations *fops); int (*inode_create)(struct inode *dir, struct dentry *dentry, umode_t mode); int (*inode_link)(struct dentry *old_dentry, struct inode *dir, @@ -1876,6 +1883,7 @@ struct security_hook_heads { struct hlist_head inode_alloc_security; struct hlist_head inode_free_security; struct hlist_head inode_init_security; + struct hlist_head inode_init_security_anon; struct hlist_head inode_create;
[PATCH 1/7] Add a new flags-accepting interface for anonymous inodes
Add functions forwarding from the old names to the new ones so we don't need to change any callers. Signed-off-by: Daniel Colascione --- fs/anon_inodes.c| 62 ++--- include/linux/anon_inodes.h | 27 +--- 2 files changed, 59 insertions(+), 30 deletions(-) diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c index 89714308c25b..caa36019afca 100644 --- a/fs/anon_inodes.c +++ b/fs/anon_inodes.c @@ -56,60 +56,71 @@ static struct file_system_type anon_inode_fs_type = { }; /** - * anon_inode_getfile - creates a new file instance by hooking it up to an - * anonymous inode, and a dentry that describe the "class" - * of the file + * anon_inode_getfile2 - creates a new file instance by hooking it up to + * an anonymous inode, and a dentry that describe + * the "class" of the file * * @name:[in]name of the "class" of the new file * @fops:[in]file operations for the new file * @priv:[in]private data for the new file (will be file's private_data) - * @flags: [in]flags + * @flags: [in]flags for the file + * @anon_inode_flags: [in] flags for anon_inode* * - * Creates a new file by hooking it on a single inode. This is useful for files + * Creates a new file by hooking it on an unspecified inode. This is useful for files * that do not need to have a full-fledged inode in order to operate correctly. * All the files created with anon_inode_getfile() will share a single inode, * hence saving memory and avoiding code duplication for the file/inode/dentry * setup. Returns the newly created file* or an error pointer. + * + * anon_inode_flags must be zero. */ -struct file *anon_inode_getfile(const char *name, - const struct file_operations *fops, - void *priv, int flags) +struct file *anon_inode_getfile2(const char *name, +const struct file_operations *fops, +void *priv, int flags, int anon_inode_flags) { + struct inode *inode; struct file *file; - if (IS_ERR(anon_inode_inode)) - return ERR_PTR(-ENODEV); - - if (fops->owner && !try_module_get(fops->owner)) - return ERR_PTR(-ENOENT); + if (anon_inode_flags) + return ERR_PTR(-EINVAL); + inode = anon_inode_inode; + if (IS_ERR(inode)) + return ERR_PTR(-ENODEV); /* -* We know the anon_inode inode count is always greater than zero, -* so ihold() is safe. +* We know the anon_inode inode count is always +* greater than zero, so ihold() is safe. */ - ihold(anon_inode_inode); - file = alloc_file_pseudo(anon_inode_inode, anon_inode_mnt, name, + ihold(inode); + + if (fops->owner && !try_module_get(fops->owner)) { + file = ERR_PTR(-ENOENT); + goto err; + } + + file = alloc_file_pseudo(inode, anon_inode_mnt, name, flags & (O_ACCMODE | O_NONBLOCK), fops); if (IS_ERR(file)) goto err; - file->f_mapping = anon_inode_inode->i_mapping; + file->f_mapping = inode->i_mapping; file->private_data = priv; return file; err: - iput(anon_inode_inode); + iput(inode); module_put(fops->owner); return file; } EXPORT_SYMBOL_GPL(anon_inode_getfile); +EXPORT_SYMBOL_GPL(anon_inode_getfile2); /** - * anon_inode_getfd - creates a new file instance by hooking it up to an - *anonymous inode, and a dentry that describe the "class" - *of the file + * anon_inode_getfd2 - creates a new file instance by hooking it up to an + * anonymous inode, and a dentry that describe the "class" + * of the file * * @name:[in]name of the "class" of the new file * @fops:[in]file operations for the new file @@ -122,8 +133,8 @@ EXPORT_SYMBOL_GPL(anon_inode_getfile); * hence saving memory and avoiding code duplication for the file/inode/dentry * setup. Returns new descriptor or an error code. */ -int anon_inode_getfd(const char *name, const struct file_operations *fops, -void *priv, int flags) +int anon_inode_getfd2(const char *name, const struct file_operations *fops, + void *priv, int flags, int anon_inode_flags) { int error, fd; struct file *file; @@ -133,7 +144,7 @@ int anon_inode_getfd(const char *name, const struct file_operations *fops, return error; fd = error; - file = anon_inode_getfile(name, fops, priv, flags); + file = anon_inode_getfile2(name, fops, priv, flags, anon_inode_flags); if (IS_ERR(file)) { error = PTR_ERR(file); goto
[PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
The new secure flag makes userfaultfd use a new "secure" anonymous file object instead of the default one, letting security modules supervise userfaultfd use. Requiring that users pass a new flag lets us avoid changing the semantics for existing callers. Signed-off-by: Daniel Colascione --- fs/userfaultfd.c | 28 +--- include/uapi/linux/userfaultfd.h | 8 2 files changed, 33 insertions(+), 3 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index f9fd18670e22..29f920fb236e 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1022,6 +1022,13 @@ static int resolve_userfault_fork(struct userfaultfd_ctx *ctx, { int fd; + /* +* Using a secure-mode UFFD to monitor forks isn't supported +* right now. +*/ + if (new->flags & UFFD_SECURE) + return -EOPNOTSUPP; + fd = anon_inode_getfd("[userfaultfd]", _fops, new, O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS)); if (fd < 0) @@ -1841,6 +1848,18 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx, ret = -EINVAL; goto out; } + if ((ctx->flags & UFFD_SECURE) && + (features & UFFD_FEATURE_EVENT_FORK)) { + /* +* We don't support UFFD_FEATURE_EVENT_FORK on a +* secure-mode UFFD: doing so would need us to +* construct the new file object in the context of the +* fork child, and it's not worth it right now. +*/ + ret = -EINVAL; + goto out; + } + /* report all available features and ioctls to userland */ uffdio_api.features = UFFD_API_FEATURES; uffdio_api.ioctls = UFFD_API_IOCTLS; @@ -1942,6 +1961,7 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) { struct userfaultfd_ctx *ctx; int fd; + static const int uffd_flags = UFFD_SECURE; if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE)) return -EPERM; @@ -1951,8 +1971,9 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) /* Check the UFFD_* constants for consistency. */ BUILD_BUG_ON(UFFD_CLOEXEC != O_CLOEXEC); BUILD_BUG_ON(UFFD_NONBLOCK != O_NONBLOCK); + BUILD_BUG_ON(UFFD_SHARED_FCNTL_FLAGS & uffd_flags); - if (flags & ~UFFD_SHARED_FCNTL_FLAGS) + if (flags & ~(UFFD_SHARED_FCNTL_FLAGS | uffd_flags)) return -EINVAL; ctx = kmem_cache_alloc(userfaultfd_ctx_cachep, GFP_KERNEL); @@ -1969,8 +1990,9 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) /* prevent the mm struct to be freed */ mmgrab(ctx->mm); - fd = anon_inode_getfd("[userfaultfd]", _fops, ctx, - O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS)); + fd = anon_inode_getfd2("[userfaultfd]", _fops, ctx, + O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS), + ((flags & UFFD_SECURE) ? ANON_INODE_SECURE : 0)); if (fd < 0) { mmdrop(ctx->mm); kmem_cache_free(userfaultfd_ctx_cachep, ctx); diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 48f1a7c2f1f0..12d7d40d7f25 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -231,4 +231,12 @@ struct uffdio_zeropage { __s64 zeropage; }; +/* + * Flags for the userfaultfd(2) system call itself. + */ + +/* + * Create a userfaultfd with MAC security checks enabled. + */ +#define UFFD_SECURE 1 #endif /* _LINUX_USERFAULTFD_H */ -- 2.23.0.700.g56cf767bdb-goog
[PATCH 0/7] Harden userfaultfd
Userfaultfd in unprivileged contexts could be potentially very useful. We'd like to harden userfaultfd to make such unprivileged use less risky. This patch series allows SELinux to manage userfaultfd file descriptors (via a new flag, for compatibility with existing code) and allows administrators to limit userfaultfd to servicing user-mode faults, increasing the difficulty of using userfaultfd in exploit chains invoking delaying kernel faults. A new anon_inodes interface allows callers to opt into SELinux management of anonymous file objects. In this mode, anon_inodes creates new ephemeral inodes for anonymous file objects instead of reusing a singleton dummy inode. A new LSM hook gives security modules an opportunity to configure and veto these ephemeral inodes. Existing anon_inodes users must opt into the new functionality. Daniel Colascione (7): Add a new flags-accepting interface for anonymous inodes Add a concept of a "secure" anonymous file Add a UFFD_SECURE flag to the userfaultfd API. Teach SELinux about a new userfaultfd class Let userfaultfd opt out of handling kernel-mode faults Allow users to require UFFD_SECURE Add a new sysctl for limiting userfaultfd to user mode faults Documentation/admin-guide/sysctl/vm.rst | 19 +- fs/anon_inodes.c| 89 + fs/userfaultfd.c| 47 +++-- include/linux/anon_inodes.h | 27 ++-- include/linux/lsm_hooks.h | 8 +++ include/linux/security.h| 2 + include/linux/userfaultfd_k.h | 3 + include/uapi/linux/userfaultfd.h| 14 kernel/sysctl.c | 9 +++ security/security.c | 8 +++ security/selinux/hooks.c| 68 +++ security/selinux/include/classmap.h | 2 + 12 files changed, 256 insertions(+), 40 deletions(-) -- 2.23.0.700.g56cf767bdb-goog
[PATCH 5/7] Let userfaultfd opt out of handling kernel-mode faults
userfaultfd handles page faults from both user and kernel code. Add a new UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes the resulting userfaultfd object refuse to handle faults from kernel mode, treating these faults as if SIGBUS were always raised, causing the kernel code to fail with EFAULT. A future patch adds a knob allowing administrators to give some processes the ability to create userfaultfd file objects only if they pass UFFD_USER_MODE_ONLY, reducing the likelihood that these processes will exploit userfaultfd's ability to delay kernel page faults to open timing windows for future exploits. Signed-off-by: Daniel Colascione --- fs/userfaultfd.c | 5 - include/uapi/linux/userfaultfd.h | 6 ++ 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 1123089c3d55..986d23b2cd33 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -389,6 +389,9 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) if (ctx->features & UFFD_FEATURE_SIGBUS) goto out; + if ((vmf->flags & FAULT_FLAG_USER) == 0 && + ctx->flags & UFFD_USER_MODE_ONLY) + goto out; /* * If it's already released don't get it. This avoids to loop @@ -1959,7 +1962,7 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) { struct userfaultfd_ctx *ctx; int fd; - static const int uffd_flags = UFFD_SECURE; + static const int uffd_flags = UFFD_SECURE | UFFD_USER_MODE_ONLY; if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE)) return -EPERM; diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 12d7d40d7f25..eadd1497e7b5 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -239,4 +239,10 @@ struct uffdio_zeropage { * Create a userfaultfd with MAC security checks enabled. */ #define UFFD_SECURE 1 + +/* + * Create a userfaultfd that can handle page faults only in user mode. + */ +#define UFFD_USER_MODE_ONLY 2 + #endif /* _LINUX_USERFAULTFD_H */ -- 2.23.0.700.g56cf767bdb-goog
[GIT PULL] MIPS fixes
Hi Linus, Here are a few MIPS fixes for 5.4; please pull. Thanks, Paul The following changes since commit da0c9ea146cbe92b832f1b0f694840ea8eb33cce: Linux 5.4-rc2 (2019-10-06 14:27:30 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux.git tags/mips_fixes_5.4_2 for you to fetch changes up to 2f2b4fd674cadd8c6b40eb629e140a14db4068fd: MIPS: Disable Loongson MMI instructions for kernel build (2019-10-10 11:58:52 -0700) A few MIPS fixes for 5.4: - Build fixes for CONFIG_OPTIMIZE_INLINING=y builds in which the compiler may choose not to inline __xchg() & __cmpxchg(). - A build fix for Loongson configurations with GCC 9.x. - Expose some extra HWCAP bits to indicate support for various instruction set extensions to userland. - Fix bad stack access in firmware handling code for old SNI RM200/300/400 machines. Jiaxun Yang (1): MIPS: elf_hwcap: Export userspace ASEs Paul Burton (1): MIPS: Disable Loongson MMI instructions for kernel build Thomas Bogendoerfer (3): MIPS: include: Mark __cmpxchg as __always_inline MIPS: include: Mark __xchg as __always_inline MIPS: fw: sni: Fix out of bounds init of o32 stack arch/mips/fw/sni/sniprom.c | 2 +- arch/mips/include/asm/cmpxchg.h| 9 + arch/mips/include/uapi/asm/hwcap.h | 11 +++ arch/mips/kernel/cpu-probe.c | 33 + arch/mips/loongson64/Platform | 4 arch/mips/vdso/Makefile| 1 + 6 files changed, 55 insertions(+), 5 deletions(-) signature.asc Description: PGP signature
God bless you.
Dear friend how are you today? I know you will be surprise to receive this message from me because; we have n ot met before but please listen to me very well. I am writing you this mail f rom a Hospital. My name is Mrs. Anna Mustafa. I am a widow and very sick now. I am suffering from Endometrial Cancer which my doctor has confirmed that I will not survive it because of some damages. Now because of the condition of my health I have decided to donate out my late husband fund the sum of $3, 5 00,000.00 on Charity Purpose through your help. All you have to do is to use the money in the following ways. (1) To build school for the poor children. (2) To help the Orphanages, Sick People, and Poor Widows etc. If you agree to help me, I will instruct the bank to proceed and transfer the money to your account to enable you start this project on my behalf since I am very sick no w and cannot do this work by myself. Lastly, after the transfer of the money to your account, I permit you to take out 30% of the money for your recompense in doing this work. I don’t have a child or any available relative who can inherit this money when I die. I wil tell you more about myself and how to proceed forward on this transaction. God bless you. Mrs. Anna
Re: [Outreachy kernel] [PATCH v2 3/5] staging: octeon: remove typedef declaration for cvmx_fau_reg_32
On Sat, 12 Oct 2019, Wambui Karuga wrote: > Remove typedef declaration for enum cvmx_fau_reg_32. > Also replace its previous uses with new declaration format. > Issue found by checkpatch.pl > > Signed-off-by: Wambui Karuga > --- > drivers/staging/octeon/octeon-stubs.h | 14 -- > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/drivers/staging/octeon/octeon-stubs.h > b/drivers/staging/octeon/octeon-stubs.h > index 0991be329139..40f0cfee0dff 100644 > --- a/drivers/staging/octeon/octeon-stubs.h > +++ b/drivers/staging/octeon/octeon-stubs.h > @@ -201,9 +201,9 @@ union cvmx_helper_link_info { > } s; > }; > > -typedef enum { > +enum cvmx_fau_reg_32 { > CVMX_FAU_REG_32_START = 0, > -} cvmx_fau_reg_32_t; > +}; > > typedef enum { > CVMX_FAU_OP_SIZE_8 = 0, > @@ -1178,16 +1178,18 @@ union cvmx_gmxx_rxx_rx_inbnd { > } s; > }; > > -static inline int32_t cvmx_fau_fetch_and_add32(cvmx_fau_reg_32_t reg, > +static inline int32_t cvmx_fau_fetch_and_add32(enum cvmx_fau_reg_32 reg, > int32_t value) These int32_t's don't look very desirable either. If there is only one possible definition, you can just replace it by what it is defined to be. julia > { > return value; > } > > -static inline void cvmx_fau_atomic_add32(cvmx_fau_reg_32_t reg, int32_t > value) > +static inline void cvmx_fau_atomic_add32(enum cvmx_fau_reg_32 reg, > + int32_t value) > { } > > -static inline void cvmx_fau_atomic_write32(cvmx_fau_reg_32_t reg, int32_t > value) > +static inline void cvmx_fau_atomic_write32(enum cvmx_fau_reg_32 reg, > +int32_t value) > { } > > static inline uint64_t cvmx_scratch_read64(uint64_t address) > @@ -1364,7 +1366,7 @@ static inline int cvmx_spi_restart_interface(int > interface, > } > > static inline void cvmx_fau_async_fetch_and_add32(uint64_t scraddr, > - cvmx_fau_reg_32_t reg, > + enum cvmx_fau_reg_32 reg, > int32_t value) > { } > > -- > 2.23.0 > > -- > You received this message because you are subscribed to the Google Groups > "outreachy-kernel" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to outreachy-kernel+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/outreachy-kernel/b7216f423d8e06b2ed7ac2df643a9215cd95be32.1570821661.git.wambui.karugax%40gmail.com. >
Re: [Outreachy kernel] [PATCH v2 0/5] Remove typedef declarations in staging: octeon
On Sat, 12 Oct 2019, Wambui Karuga wrote: > This patchset removes the addition of new typedefs data types in octeon, > along with replacing the previous uses with the new declaration format. > > v2 of the series removes the obsolete "_t" notation in the named types. > > Wambui Karuga (5): > staging: octeon: remove typedef declaration for cvmx_wqe > staging: octeon: remove typedef declaration for cvmx_helper_link_info > staging: octeon: remove typedef declaration for cvmx_fau_reg_32 > staging: octeon: remove typedef declartion for cvmx_pko_command_word0 > staging: octeon: remove typedef declaration for cvmx_fau_op_size > > drivers/staging/octeon/ethernet-mdio.c | 6 +-- > drivers/staging/octeon/ethernet-rgmii.c | 4 +- > drivers/staging/octeon/ethernet-rx.c | 6 +-- > drivers/staging/octeon/ethernet-tx.c | 4 +- > drivers/staging/octeon/ethernet.c| 6 +-- > drivers/staging/octeon/octeon-ethernet.h | 2 +- > drivers/staging/octeon/octeon-stubs.h| 56 > 7 files changed, 43 insertions(+), 41 deletions(-) For the series: Acked-by: Julia Lawall > > -- > 2.23.0 > > -- > You received this message because you are subscribed to the Google Groups > "outreachy-kernel" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to outreachy-kernel+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/outreachy-kernel/cover.1570821661.git.wambui.karugax%40gmail.com. >
mac80211: Checking a kmemdup() call in ieee80211_send_assoc()
Hello, I tried another script for the semantic patch language out. This source code analysis approach points out that the implementation of the function “ieee80211_send_assoc” contains still an unchecked call of the function “kmemdup”. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/mac80211/mlme.c?id=1c0cc5f1ae5ee5a6913704c0d75a6e99604ee30a#n980 https://elixir.bootlin.com/linux/v5.4-rc2/source/net/mac80211/mlme.c#L980 How do you think about to improve it? Regards, Markus
drivers/net/ethernet/pensando/ionic/ionic_lif.c:333:2: error: implicit declaration of function 'dynamic_hex_dump'
tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master head: 1c0cc5f1ae5ee5a6913704c0d75a6e99604ee30a commit: 77ceb68e29ccd25d923b6af59e74ecaf736cc4b7 ionic: Add notifyq support date: 5 weeks ago config: x86_64-randconfig-a002-201941 (attached as .config) compiler: gcc-6 (Debian 6.3.0-18+deb9u1) 6.3.0 20170516 reproduce: git checkout 77ceb68e29ccd25d923b6af59e74ecaf736cc4b7 # save the attached .config to linux build tree make ARCH=x86_64 If you fix the issue, kindly add following tag Reported-by: kbuild test robot All errors (new ones prefixed by >>): drivers/net/ethernet/pensando/ionic/ionic_lif.c: In function 'ionic_notifyq_service': >> drivers/net/ethernet/pensando/ionic/ionic_lif.c:333:2: error: implicit >> declaration of function 'dynamic_hex_dump' >> [-Werror=implicit-function-declaration] dynamic_hex_dump("event ", DUMP_PREFIX_OFFSET, 16, 1, ^~~~ cc1: some warnings being treated as errors vim +/dynamic_hex_dump +333 drivers/net/ethernet/pensando/ionic/ionic_lif.c 311 312 static bool ionic_notifyq_service(struct ionic_cq *cq, 313struct ionic_cq_info *cq_info) 314 { 315 union ionic_notifyq_comp *comp = cq_info->cq_desc; 316 struct net_device *netdev; 317 struct ionic_queue *q; 318 struct ionic_lif *lif; 319 u64 eid; 320 321 q = cq->bound_q; 322 lif = q->info[0].cb_arg; 323 netdev = lif->netdev; 324 eid = le64_to_cpu(comp->event.eid); 325 326 /* Have we run out of new completions to process? */ 327 if (eid <= lif->last_eid) 328 return false; 329 330 lif->last_eid = eid; 331 332 dev_dbg(lif->ionic->dev, "notifyq event:\n"); > 333 dynamic_hex_dump("event ", DUMP_PREFIX_OFFSET, 16, 1, 334 comp, sizeof(*comp), true); 335 336 switch (le16_to_cpu(comp->event.ecode)) { 337 case IONIC_EVENT_LINK_CHANGE: 338 netdev_info(netdev, "Notifyq IONIC_EVENT_LINK_CHANGE eid=%lld\n", 339 eid); 340 netdev_info(netdev, 341 " link_status=%d link_speed=%d\n", 342 le16_to_cpu(comp->link_change.link_status), 343 le32_to_cpu(comp->link_change.link_speed)); 344 break; 345 case IONIC_EVENT_RESET: 346 netdev_info(netdev, "Notifyq IONIC_EVENT_RESET eid=%lld\n", 347 eid); 348 netdev_info(netdev, " reset_code=%d state=%d\n", 349 comp->reset.reset_code, 350 comp->reset.state); 351 break; 352 default: 353 netdev_warn(netdev, "Notifyq unknown event ecode=%d eid=%lld\n", 354 comp->event.ecode, eid); 355 break; 356 } 357 358 return true; 359 } 360 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Hello
Hi, Please it's very important we speak and discuss my proposal, regards the letter I sent to you before on this deposit here. ishak.
SUNRPC: Checking a kmemdup() call in xdr_netobj_dup()
Hello, I tried another script for the semantic patch language out. This source code analysis approach points out that the implementation of the function “xdr_netobj_dup” contains still an unchecked call of the function “kmemdup”. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/sunrpc/xdr.h?id=1c0cc5f1ae5ee5a6913704c0d75a6e99604ee30a#n167 https://elixir.bootlin.com/linux/v5.4-rc2/source/include/linux/sunrpc/xdr.h#L167 How do you think about to improve it? Regards, Markus
Re: drivers/net/ethernet/pensando/ionic/ionic_lif.c:333:2: error: implicit declaration of function 'dynamic_hex_dump'; did you mean 'seq_hex_dump'?
On 10/12/19 10:45 AM, kbuild test robot wrote: Hi Shannon, FYI, the error/warning still remains. tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master head: 1c0cc5f1ae5ee5a6913704c0d75a6e99604ee30a commit: 77ceb68e29ccd25d923b6af59e74ecaf736cc4b7 ionic: Add notifyq support date: 5 weeks ago config: x86_64-randconfig-a002-201941 (attached as .config) compiler: gcc-7 (Debian 7.4.0-13) 7.4.0 reproduce: git checkout 77ceb68e29ccd25d923b6af59e74ecaf736cc4b7 # save the attached .config to linux build tree make ARCH=x86_64 If you fix the issue, kindly add following tag Reported-by: kbuild test robot Hmmm, I thought Arnd Bergmann had already addressed these, and I Acked: https://lore.kernel.org/netdev/91b69922-926a-9c27-3a08-e2db2d7ea...@pensando.io/ Dave, is there something more I need to do here? sln All errors (new ones prefixed by >>): drivers/net/ethernet/pensando/ionic/ionic_lif.c: In function 'ionic_notifyq_service': drivers/net/ethernet/pensando/ionic/ionic_lif.c:333:2: error: implicit declaration of function 'dynamic_hex_dump'; did you mean 'seq_hex_dump'? [-Werror=implicit-function-declaration] dynamic_hex_dump("event ", DUMP_PREFIX_OFFSET, 16, 1, ^~~~ seq_hex_dump cc1: some warnings being treated as errors vim +333 drivers/net/ethernet/pensando/ionic/ionic_lif.c 311 312 static bool ionic_notifyq_service(struct ionic_cq *cq, 313 struct ionic_cq_info *cq_info) 314 { 315 union ionic_notifyq_comp *comp = cq_info->cq_desc; 316 struct net_device *netdev; 317 struct ionic_queue *q; 318 struct ionic_lif *lif; 319 u64 eid; 320 321 q = cq->bound_q; 322 lif = q->info[0].cb_arg; 323 netdev = lif->netdev; 324 eid = le64_to_cpu(comp->event.eid); 325 326 /* Have we run out of new completions to process? */ 327 if (eid <= lif->last_eid) 328 return false; 329 330 lif->last_eid = eid; 331 332 dev_dbg(lif->ionic->dev, "notifyq event:\n"); > 333 dynamic_hex_dump("event ", DUMP_PREFIX_OFFSET, 16, 1, 334 comp, sizeof(*comp), true); 335 336 switch (le16_to_cpu(comp->event.ecode)) { 337 case IONIC_EVENT_LINK_CHANGE: 338 netdev_info(netdev, "Notifyq IONIC_EVENT_LINK_CHANGE eid=%lld\n", 339 eid); 340 netdev_info(netdev, 341 " link_status=%d link_speed=%d\n", 342 le16_to_cpu(comp->link_change.link_status), 343 le32_to_cpu(comp->link_change.link_speed)); 344 break; 345 case IONIC_EVENT_RESET: 346 netdev_info(netdev, "Notifyq IONIC_EVENT_RESET eid=%lld\n", 347 eid); 348 netdev_info(netdev, " reset_code=%d state=%d\n", 349 comp->reset.reset_code, 350 comp->reset.state); 351 break; 352 default: 353 netdev_warn(netdev, "Notifyq unknown event ecode=%d eid=%lld\n", 354 comp->event.ecode, eid); 355 break; 356 } 357 358 return true; 359 } 360 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
[PATCH v2 5/5] staging: octeon: remove typedef declaration for cvmx_fau_op_size
Remove addition of new typedef for enum cvmx_fau_op_size. Issue found by checkpatch.pl Signed-off-by: Wambui Karuga --- drivers/staging/octeon/octeon-stubs.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/staging/octeon/octeon-stubs.h b/drivers/staging/octeon/octeon-stubs.h index db2d6f64b666..1b72f02a361f 100644 --- a/drivers/staging/octeon/octeon-stubs.h +++ b/drivers/staging/octeon/octeon-stubs.h @@ -205,12 +205,12 @@ enum cvmx_fau_reg_32 { CVMX_FAU_REG_32_START = 0, }; -typedef enum { +enum cvmx_fau_op_size { CVMX_FAU_OP_SIZE_8 = 0, CVMX_FAU_OP_SIZE_16 = 1, CVMX_FAU_OP_SIZE_32 = 2, CVMX_FAU_OP_SIZE_64 = 3 -} cvmx_fau_op_size_t; +}; typedef enum { CVMX_SPI_MODE_UNKNOWN = 0, -- 2.23.0
[PATCH v2 3/5] staging: octeon: remove typedef declaration for cvmx_fau_reg_32
Remove typedef declaration for enum cvmx_fau_reg_32. Also replace its previous uses with new declaration format. Issue found by checkpatch.pl Signed-off-by: Wambui Karuga --- drivers/staging/octeon/octeon-stubs.h | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/staging/octeon/octeon-stubs.h b/drivers/staging/octeon/octeon-stubs.h index 0991be329139..40f0cfee0dff 100644 --- a/drivers/staging/octeon/octeon-stubs.h +++ b/drivers/staging/octeon/octeon-stubs.h @@ -201,9 +201,9 @@ union cvmx_helper_link_info { } s; }; -typedef enum { +enum cvmx_fau_reg_32 { CVMX_FAU_REG_32_START = 0, -} cvmx_fau_reg_32_t; +}; typedef enum { CVMX_FAU_OP_SIZE_8 = 0, @@ -1178,16 +1178,18 @@ union cvmx_gmxx_rxx_rx_inbnd { } s; }; -static inline int32_t cvmx_fau_fetch_and_add32(cvmx_fau_reg_32_t reg, +static inline int32_t cvmx_fau_fetch_and_add32(enum cvmx_fau_reg_32 reg, int32_t value) { return value; } -static inline void cvmx_fau_atomic_add32(cvmx_fau_reg_32_t reg, int32_t value) +static inline void cvmx_fau_atomic_add32(enum cvmx_fau_reg_32 reg, +int32_t value) { } -static inline void cvmx_fau_atomic_write32(cvmx_fau_reg_32_t reg, int32_t value) +static inline void cvmx_fau_atomic_write32(enum cvmx_fau_reg_32 reg, + int32_t value) { } static inline uint64_t cvmx_scratch_read64(uint64_t address) @@ -1364,7 +1366,7 @@ static inline int cvmx_spi_restart_interface(int interface, } static inline void cvmx_fau_async_fetch_and_add32(uint64_t scraddr, - cvmx_fau_reg_32_t reg, + enum cvmx_fau_reg_32 reg, int32_t value) { } -- 2.23.0