Re: Why add the general notification queue and its sources
Hi, On 06/09/2019 16:53, Linus Torvalds wrote: On Fri, Sep 6, 2019 at 8:35 AM Linus Torvalds wrote: This is why I like pipes. You can use them today. They are simple, and extensible, and you don't need to come up with a new subsystem and some untested ad-hoc thing that nobody has actually used. The only _real_ complexity is to make sure that events are reliably parseable. That's where you really want to use the Linux-only "packet pipe" thing, becasue otherwise you have to have size markers or other things to delineate events. But if you do that, then it really becomes trivial. And I checked, we made it available to user space, even if the original reason for that code was kernel-only autofs use: you just need to make the pipe be O_DIRECT. This overly stupid program shows off the feature: #define _GNU_SOURCE #include #include int main(int argc, char **argv) { int fd[2]; char buf[10]; pipe2(fd, O_DIRECT | O_NONBLOCK); write(fd[1], "hello", 5); write(fd[1], "hi", 2); read(fd[0], buf, sizeof(buf)); read(fd[0], buf, sizeof(buf)); return 0; } and it you strace it (because I was too lazy to add error handling or printing of results), you'll see write(4, "hello", 5)= 5 write(4, "hi", 2) = 2 read(3, "hello", 10)= 5 read(3, "hi", 10) = 2 note how you got packets of data on the reader side, instead of getting the traditional "just buffer it as a stream". So now you can even have multiple readers of the same event pipe, and packetization is obvious and trivial. Of course, I'm not sure why you'd want to have multiple readers, and you'd lose _ordering_, but if all events are independent, this _might_ be a useful thing in a threaded environment. Maybe. (Side note: a zero-sized write will not cause a zero-sized packet. It will just be dropped). Linus The events are generally not independent - we would need ordering either implicit in the protocol or explicit in the messages. We also need to know in case messages are dropped too - doesn't need to be anything fancy, just some idea that since we last did a read, there are messages that got lost, most likely due to buffer overrun. That is why the initial idea was to use netlink, since it solves a lot of those issues. The downside was that the indirect nature of the netlink sockets resulted in making it tricky to know the namespace of the process to which the message was to be delivered (and hence whether it should be delivered at all), Steve.
Re: Why add the general notification queue and its sources
Hi, On 05/09/2019 18:19, Linus Torvalds wrote: On Thu, Sep 5, 2019 at 10:01 AM David Howells wrote: I'm just going to be very blunt about this, and say that there is no way I can merge any of this *ever*, unless other people stand up and say that (a) they'll use it and (b) they'll actively develop it and participate in testing and coding Besides the core notification buffer which ties this together, there are a number of sources that I've implemented, not all of which are in this patch series: You've at least now answered part of the "Why", but you didn't actually answer the whole "another developer" part. I really don't like how nobody else than you seems to even look at any of the key handling patches. Because nobody else seems to care. This seems to be another new subsystem / driver that has the same pattern. If it's all just you, I don't want to merge it, because I really want more than just other developers doing "Reviewed-by" after looking at somebody elses code that they don't actually use or really care about. See what I'm saying? New features that go into the kernel should have multiple users. Not a single developer who pushes both the kernel feature and the single use of that feature. This very much comes from me reverting the key ACL pull. Not only did I revert it, ABSOLUTELY NOBODY even reacted to the revert. Nobody stepped up and said they they want that new ACL code, and pushed for a fix. There was some very little murmuring about it when Mimi at least figured out _why_ it broke, but other than that all the noise I saw about the revert was Eric Biggers pointing out it broke other things too, and that it had actually broken some test suites. But since it hadn't even been in linux-next, that too had been noticed much too late. See what I'm saying? This whole "David Howells does his own features that nobody else uses" needs to stop. You need to have a champion. I just don't feel safe pulling these kinds of changes from you, because I get the feeling that ABSOLUTELY NOBODY ELSE ever really looked at it or really cared. Most of the patches has nobody else even Cc'd, and even the patches that do have some "Reviewed-by" feel more like somebody else went "ok, the change looks fine to me", without any other real attachment to the code. New kernel features and interfaces really need to have a higher barrier of entry than one developer working on his or her own thing. Is that a change from 25 years ago? Or yes it is. We can point to lots of "single developer did a thing" from years past. But things have changed. And once bitten, twice shy: I really am a _lot_ more nervous about all these key changes now. Linus There are a number of potential users, some waiting just to have a mechanism to avoid the racy alternatives to (for example) parsing /proc/mounts repeatedly, others perhaps a bit further away, but who have nonetheless expressed interest in having an interface which allows notifications for mounts. The subject of mount notifications has been discussed at LSF/MM in the past too, I proposed it as a topic a little while back: https://www.spinics.net/lists/linux-block/msg07653.html and David's patch set is a potential solution to some of the issues that I raised there. The original series for the new mount API came from an idea of Al/Miklos which was also presented at LSF/MM 2017, and this is a follow on project. So it has not come out of nowhere, but has been something that has been discussed in various forums over a period of time. Originally, there was a proposal to use netlink for the notifications, however that didn't seem to meet with general approval, even though Ian Kent did some work towards figuring out whether that would be a useful direction to go in. David has since come up with the proposal presented here, which is intended to improve on the original proposal in various ways - mostly making the notifications more efficient (i.e. smaller) and also generic enough that it might have uses beyond the original intent of just being a mount notification mechanism. The original reason for the mount notification mechanism was so that we are able to provide information to GUIs and similar filesystem and storage management tools, matching the state of the filesystem with the state of the underlying devices. This is part of a larger project entitled "Project Springfield" to try and provide better management tools for storage and filesystems. I've copied David Lehman in, since he can provide a wider view on this topic. It is something that I do expect will receive wide use, and which will be tested carefully. I know that Ian Kent has started work on some support for libmount for example, even outside of autofs. We do regularly hear from customers that better storage and filesystem management tools are something that they consider very important, so that is why we are spending such a lot of effort in trying to improve
Re: [PATCH 18/25] gfs2: Convert to properly refcounting bdi
Hi, On 12/04/17 09:16, Christoph Hellwig wrote: On Wed, Mar 29, 2017 at 12:56:16PM +0200, Jan Kara wrote: Similarly to set_bdev_super() GFS2 just used block device reference to bdi. Convert it to properly getting bdi reference. The reference will get automatically dropped on superblock destruction. Hmm, why iisn't gfs2 simply using the generic mount_bdev code? Otherwise looks fine: Reviewed-by: Christoph Hellwig It is more or less. However we landed up copying it because we needed a slight modification in order to cope with the metafs mounts. There may be scope to factor out the common parts I guess. We cannot select the root dentry until after we've parsed the mount command line, so it is really just the last part of the function that is different, Steve.
Re: [Lsf-pc] [LSF/MM TOPIC] [LSF/MM ATTEND] FS Management Interfaces
Hi, On 10/01/17 10:14, Jan Kara wrote: Hi, On Tue 10-01-17 09:44:59, Steven Whitehouse wrote: I had originally thought about calling the proposal "kernel/userland interface", however that seemed a bit vague and management interfaces seems like a better title since it is I hope a bit clearer of the kind of thing that I'm thinking about in this case. There are a number of possible sub-topics and I hope that I might find a few more before LSF too. One is that of space management (we have statfs, but currently no notifications for thresholds crossed etc., so everything is polled. That is ok sometimes, but statfs can be expensive in the case of distributed filesystems, if 100% accurate. We could just have ENOSPC notifications for 100% full, or something more generic), another is state transitions (is the fs running normally, or has it gone read only/withdrawn/etc due to I/O errors?) and a further topic would be working towards a common interface for fs statistics (at the moment each fs defines their own interface). One potential implementation, at least for the first two sub-topics, would be to use something along the lines of the quota netlink interface, but since few ideas survive first contact with the community at large, I'm throwing this out for further discussion and feedback on whether this approach is considered the right way to go. Assuming the topic is accepted, my intention would be to gather together some additional sub-topics relating to fs management to go along with those I mentioned above, and I'd be very interested to hear of any other issues that could be usefully added to the list for discussion. So this topic came up last year and probably the year before as well (heh, I can even find some patches from 2011 [1]). I think the latest attempt at what you suggest was here [2]. So clearly there's some interest in these interfaces but not enough to actually drive anything to completion. So for this topic to be useful, I think you need to go at least through the patches in [2] and comments to them and have a concrete proposal that can be discussed and some commitment (not necessarily from yourself) that someone is going to devote time to implement it. Because generally nobody seems to be opposed to the abstract idea but once it gets to the implementation details, it is non-trivial to get some wider agreement (statx anybody? ;)). Honza [1] https://lkml.org/lkml/2011/8/18/170 [2] https://lkml.org/lkml/2015/6/16/456 Yes, statx is something else I'd like to see progress too :-) Going back to this topic though, I agree wrt having a concrete proposal, and I'll try and have something ready for LSF, we have a few weeks in hand. I'll collect up the details of the previous efforts (including Lukas' suggestion) and see how far we can get in the mean time, Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[LSF/MM TOPIC] [LSF/MM ATTEND] FS Management Interfaces
Hi, This is a request both to attend LSF/MM and also a topic proposal. The last couple of years I've proposed a topic of the block/fs interface. I'm happy to do that again, if there is consensus that this would be useful, however this time I thought that I'd propose something a bit different. I had originally thought about calling the proposal "kernel/userland interface", however that seemed a bit vague and management interfaces seems like a better title since it is I hope a bit clearer of the kind of thing that I'm thinking about in this case. There are a number of possible sub-topics and I hope that I might find a few more before LSF too. One is that of space management (we have statfs, but currently no notifications for thresholds crossed etc., so everything is polled. That is ok sometimes, but statfs can be expensive in the case of distributed filesystems, if 100% accurate. We could just have ENOSPC notifications for 100% full, or something more generic), another is state transitions (is the fs running normally, or has it gone read only/withdrawn/etc due to I/O errors?) and a further topic would be working towards a common interface for fs statistics (at the moment each fs defines their own interface). One potential implementation, at least for the first two sub-topics, would be to use something along the lines of the quota netlink interface, but since few ideas survive first contact with the community at large, I'm throwing this out for further discussion and feedback on whether this approach is considered the right way to go. Assuming the topic is accepted, my intention would be to gather together some additional sub-topics relating to fs management to go along with those I mentioned above, and I'd be very interested to hear of any other issues that could be usefully added to the list for discussion. My interest in other topics is fairly wide... I'm, as usual, interested in all filesystem related topics and a good number of block device and mm topics too. Anything relating to vfs, xfs, ext*, btrfs, gfs2, overlayfs, NFS/CIFS, and technologies such as copy-offload, DAX, reflink, RDMA, NVMe(F), etc., Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html