Re: Why add the general notification queue and its sources

2019-09-06 Thread Steven Whitehouse

Hi,

On 06/09/2019 16:53, Linus Torvalds wrote:

On Fri, Sep 6, 2019 at 8:35 AM Linus Torvalds
 wrote:

This is why I like pipes. You can use them today. They are simple, and
extensible, and you don't need to come up with a new subsystem and
some untested ad-hoc thing that nobody has actually used.

The only _real_ complexity is to make sure that events are reliably parseable.

That's where you really want to use the Linux-only "packet pipe"
thing, becasue otherwise you have to have size markers or other things
to delineate events. But if you do that, then it really becomes
trivial.

And I checked, we made it available to user space, even if the
original reason for that code was kernel-only autofs use: you just
need to make the pipe be O_DIRECT.

This overly stupid program shows off the feature:

 #define _GNU_SOURCE
 #include 
 #include 

 int main(int argc, char **argv)
 {
 int fd[2];
 char buf[10];

 pipe2(fd, O_DIRECT | O_NONBLOCK);
 write(fd[1], "hello", 5);
 write(fd[1], "hi", 2);
 read(fd[0], buf, sizeof(buf));
 read(fd[0], buf, sizeof(buf));
 return 0;
 }

and it you strace it (because I was too lazy to add error handling or
printing of results), you'll see

 write(4, "hello", 5)= 5
 write(4, "hi", 2)   = 2
 read(3, "hello", 10)= 5
 read(3, "hi", 10)   = 2

note how you got packets of data on the reader side, instead of
getting the traditional "just buffer it as a stream".

So now you can even have multiple readers of the same event pipe, and
packetization is obvious and trivial. Of course, I'm not sure why
you'd want to have multiple readers, and you'd lose _ordering_, but if
all events are independent, this _might_ be a useful thing in a
threaded environment. Maybe.

(Side note: a zero-sized write will not cause a zero-sized packet. It
will just be dropped).

Linus


The events are generally not independent - we would need ordering either 
implicit in the protocol or explicit in the messages. We also need to 
know in case messages are dropped too - doesn't need to be anything 
fancy, just some idea that since we last did a read, there are messages 
that got lost, most likely due to buffer overrun.


That is why the initial idea was to use netlink, since it solves a lot 
of those issues. The downside was that the indirect nature of the 
netlink sockets resulted in making it tricky to know the namespace of 
the process to which the message was to be delivered (and hence whether 
it should be delivered at all),


Steve.



Re: Why add the general notification queue and its sources

2019-09-05 Thread Steven Whitehouse

Hi,

On 05/09/2019 18:19, Linus Torvalds wrote:

On Thu, Sep 5, 2019 at 10:01 AM David Howells  wrote:

I'm just going to be very blunt about this, and say that there is no
way I can merge any of this *ever*, unless other people stand up and
say that

  (a) they'll use it

and

  (b) they'll actively develop it and participate in testing and coding

Besides the core notification buffer which ties this together, there are a
number of sources that I've implemented, not all of which are in this patch
series:

You've at least now answered part of the "Why", but you didn't
actually answer the whole "another developer" part.

I really don't like how nobody else than you seems to even look at any
of the key handling patches. Because nobody else seems to care.

This seems to be another new subsystem / driver that has the same
pattern. If it's all just you, I don't want to merge it, because I
really want more than just other developers doing "Reviewed-by" after
looking at somebody elses code that they don't actually use or really
care about.

See what I'm saying?

New features that go into the kernel should have multiple users. Not a
single developer who pushes both the kernel feature and the single use
of that feature.

This very much comes from me reverting the key ACL pull. Not only did
I revert it, ABSOLUTELY NOBODY even reacted to the revert. Nobody
stepped up and said they they want that new ACL code, and pushed for a
fix. There was some very little murmuring about it when Mimi at least
figured out _why_ it broke, but other than that all the noise I saw
about the revert was Eric Biggers pointing out it broke other things
too, and that it had actually broken some test suites. But since it
hadn't even been in linux-next, that too had been noticed much too
late.

See what I'm saying? This whole "David Howells does his own features
that nobody else uses" needs to stop. You need to have a champion. I
just don't feel safe pulling these kinds of changes from you, because
I get the feeling that ABSOLUTELY NOBODY ELSE ever really looked at it
or really cared.

Most of the patches has nobody else even Cc'd, and even the patches
that do have some "Reviewed-by" feel more like somebody else went "ok,
the change looks fine to me", without any other real attachment to the
code.

New kernel features and interfaces really need to have a higher
barrier of entry than one developer working on his or her own thing.

Is that a change from 25 years ago? Or yes it is. We can point to lots
of "single developer did a thing" from years past. But things have
changed. And once bitten, twice shy: I really am a _lot_ more nervous
about all these key changes now.

 Linus


There are a number of potential users, some waiting just to have a 
mechanism to avoid the racy alternatives to (for example) parsing 
/proc/mounts repeatedly, others perhaps a bit further away, but who have 
nonetheless expressed interest in having an interface which allows 
notifications for mounts.


The subject of mount notifications has been discussed at LSF/MM in the 
past too, I proposed it as a topic a little while back: 
https://www.spinics.net/lists/linux-block/msg07653.html and David's 
patch set is a potential solution to some of the issues that I raised 
there. The original series for the new mount API came from an idea of 
Al/Miklos which was also presented at LSF/MM 2017, and this is a follow 
on project. So it has not come out of nowhere, but has been something 
that has been discussed in various forums over a period of time.


Originally, there was a proposal to use netlink for the notifications, 
however that didn't seem to meet with general approval, even though Ian 
Kent did some work towards figuring out whether that would be a useful 
direction to go in.


David has since come up with the proposal presented here, which is 
intended to improve on the original proposal in various ways - mostly 
making the notifications more efficient (i.e. smaller) and also generic 
enough that it might have uses beyond the original intent of just being 
a mount notification mechanism.


The original reason for the mount notification mechanism was so that we 
are able to provide information to GUIs and similar filesystem and 
storage management tools, matching the state of the filesystem with the 
state of the underlying devices. This is part of a larger project 
entitled "Project Springfield" to try and provide better management 
tools for storage and filesystems. I've copied David Lehman in, since he 
can provide a wider view on this topic.


It is something that I do expect will receive wide use, and which will 
be tested carefully. I know that Ian Kent has started work on some 
support for libmount for example, even outside of autofs.


We do regularly hear from customers that better storage and filesystem 
management tools are something that they consider very important, so 
that is why we are spending such a lot of effort in trying to improve

Re: [PATCH 18/25] gfs2: Convert to properly refcounting bdi

2017-04-12 Thread Steven Whitehouse

Hi,


On 12/04/17 09:16, Christoph Hellwig wrote:

On Wed, Mar 29, 2017 at 12:56:16PM +0200, Jan Kara wrote:

Similarly to set_bdev_super() GFS2 just used block device reference to
bdi. Convert it to properly getting bdi reference. The reference will
get automatically dropped on superblock destruction.

Hmm, why iisn't gfs2 simply using the generic mount_bdev code?

Otherwise looks fine:

Reviewed-by: Christoph Hellwig 


It is more or less. However we landed up copying it because we needed a 
slight modification in order to cope with the metafs mounts. There may 
be scope to factor out the common parts I guess. We cannot select the 
root dentry until after we've parsed the mount command line, so it is 
really just the last part of the function that is different,


Steve.



Re: [Lsf-pc] [LSF/MM TOPIC] [LSF/MM ATTEND] FS Management Interfaces

2017-01-11 Thread Steven Whitehouse

Hi,


On 10/01/17 10:14, Jan Kara wrote:

Hi,

On Tue 10-01-17 09:44:59, Steven Whitehouse wrote:

I had originally thought about calling the proposal "kernel/userland
interface", however that seemed a bit vague and management interfaces seems
like a better title since it is I hope a bit clearer of the kind of thing
that I'm thinking about in this case.

There are a number of possible sub-topics and I hope that I might find a few
more before LSF too. One is that of space management (we have statfs, but
currently no notifications for thresholds crossed etc., so everything is
polled. That is ok sometimes, but statfs can be expensive in the case of
distributed filesystems, if 100% accurate. We could just have ENOSPC
notifications for 100% full, or something more generic), another is state
transitions (is the fs running normally, or has it gone read
only/withdrawn/etc due to I/O errors?) and a further topic would be working
towards a common interface for fs statistics (at the moment each fs defines
their own interface). One potential implementation, at least for the first
two sub-topics, would be to use something along the lines of the quota
netlink interface, but since few ideas survive first contact with the
community at large, I'm throwing this out for further discussion and
feedback on whether this approach is considered the right way to go.

Assuming the topic is accepted, my intention would be to gather together
some additional sub-topics relating to fs management to go along with those
I mentioned above, and I'd be very interested to hear of any other issues
that could be usefully added to the list for discussion.

So this topic came up last year and probably the year before as well (heh,
I can even find some patches from 2011 [1]). I think the latest attempt at
what you suggest was here [2]. So clearly there's some interest in these
interfaces but not enough to actually drive anything to completion. So for
this topic to be useful, I think you need to go at least through the
patches in [2] and comments to them and have a concrete proposal that can
be discussed and some commitment (not necessarily from yourself) that
someone is going to devote time to implement it. Because generally nobody
seems to be opposed to the abstract idea but once it gets to the
implementation details, it is non-trivial to get some wider agreement
(statx anybody? ;)).

Honza

[1] https://lkml.org/lkml/2011/8/18/170
[2] https://lkml.org/lkml/2015/6/16/456


Yes, statx is something else I'd like to see progress too :-) Going back 
to this topic though, I agree wrt having a concrete proposal, and I'll 
try and have something ready for LSF, we have a few weeks in hand. I'll 
collect up the details of the previous efforts (including Lukas' 
suggestion) and see how far we can get in the mean time,


Steve.


--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[LSF/MM TOPIC] [LSF/MM ATTEND] FS Management Interfaces

2017-01-10 Thread Steven Whitehouse

Hi,

This is a request both to attend LSF/MM and also a topic proposal. The 
last couple of years I've proposed a topic of the block/fs interface. 
I'm happy to do that again, if there is consensus that this would be 
useful, however this time I thought that I'd propose something a bit 
different.


I had originally thought about calling the proposal "kernel/userland 
interface", however that seemed a bit vague and management interfaces 
seems like a better title since it is I hope a bit clearer of the kind 
of thing that I'm thinking about in this case.


There are a number of possible sub-topics and I hope that I might find a 
few more before LSF too. One is that of space management (we have 
statfs, but currently no notifications for thresholds crossed etc., so 
everything is polled. That is ok sometimes, but statfs can be expensive 
in the case of distributed filesystems, if 100% accurate. We could just 
have ENOSPC notifications for 100% full, or something more generic), 
another is state transitions (is the fs running normally, or has it gone 
read only/withdrawn/etc due to I/O errors?) and a further topic would be 
working towards a common interface for fs statistics (at the moment each 
fs defines their own interface). One potential implementation, at least 
for the first two sub-topics, would be to use something along the lines 
of the quota netlink interface, but since few ideas survive first 
contact with the community at large, I'm throwing this out for further 
discussion and feedback on whether this approach is considered the right 
way to go.


Assuming the topic is accepted, my intention would be to gather together 
some additional sub-topics relating to fs management to go along with 
those I mentioned above, and I'd be very interested to hear of any other 
issues that could be usefully added to the list for discussion.


My interest in other topics is fairly wide... I'm, as usual, interested 
in all filesystem related topics and a good number of block device and 
mm topics too. Anything relating to vfs, xfs, ext*, btrfs, gfs2, 
overlayfs, NFS/CIFS, and technologies such as copy-offload, DAX, 
reflink, RDMA, NVMe(F), etc.,


Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html