Re: hotplug and modalias handling (WAS: Re: RFD: Rework/extending functionality of mdev)
On Wed, Mar 18, 2015 at 02:48 AM, Isaac Dunham said: Is this manifested as the root device never shows up? Yes, although we call it the boot device. As the one who probably posted this, I can comment further: I've heard of one computer where this was an issue, a couple years ago; a number of people on the Puppy Linux forums were experimenting with mdev, and one reported that the modalias file was missing with a Broadcom wireless card. That is interesting. Do you think this could have been due to a bug in the broadcom driver? We don't do any networking in our initrd/initramfs so maybe we can get by with the faster methods that use modalias files instead of uevent files. The broken hotplugging (or whatever the problem is) is a bigger issue for us now even though it is small compared to the problems caused by the broken modprobe a year ago. However, you may find it worth noting that find /sys will get its list of files slightly later than globbing will. I don't know what the globbing solution is that you refer to. It took me a little while to understand the following solution from the alpinelinux initrd/initramfs, partly because it does not work here at all: find /sys -name modalias | xargs sort -u | xargs modprobe -a It fails badly whenever one or more modalias files has a space in the path (which is the case here). That is easily remedied with: find /sys -name modalias -print0 \ | xargs -0 sort -u \ | xargs modprobe -a -q -b 2/dev/null This is probably more efficient than what I'm currently doing if /sys is replaced with /sys/devices. Thank you. Do you wait for the boot device to show up? Oh yes. It would not work if we didn't. I know the kernel has a rootwait parameter for similar issues with usb drives. We stopped using the related rootdelay years ago. We try to find the boot device as quickly as possible. I don't think rootdelay is applicable unless root= is specified (or something like that) but even if it were, making users guess how long it's going to take for the buses to settle (or whatever) seemed overly crude. We have a fixed timeout of 10 or 15 seconds. We keep trying to find and mount the boot device until the timeout is reached. If we can't find it within that time window then we give up. As long as the correct modules get loaded, this scheme works well. It tends to boot as quickly as possible without adding unnecessary delays. Of course, the root device may be unknown depending on what you're doing, which would make waiting properly rather difficult. We have several different modes. The default mode is to scan all usb/removable devices and all cdrom/dvd devices, mounting each in turn, looking for the file(s) we need. This is repeated until the file is found or the timeout is reached. You can also specify the boot device with a label or a uuid (even a partial uuid) or a device name like sdb1. You can also specify the type of device: cd (includes dvd), usb (includes non-usb removable devices), or hd (internal drives). On machines that can't boot directly from usb, the from=usb boot parameter has proved to be popular. The boot starts with a LiveCD but then uses a compatible usb stick as the boot device which is faster than the LiveCD and makes other features available such as persistence and remastering. (What I'd be inclined to do is check for the root and if it fails, coldplug a second time/blindly load sd_mod and usb_storage) My latest solution is to coldplug inside the loop that looks for the boot device. When we were experiencing the bug in the smaller busybox modprobe about a year ago we tried various schemes of always loading certain modules but that was not very satisfactory. It masks the problem instead of fixing it and it. If loading of hardware specific modules is not 100% reliable then where do you draw the line of which specific modules to load on every machine. Likewise, loading a bunch of modules after a delay can in some cases just further postpone the eventual failure. ISTM repeated coldplugging is a reasonable compromise even if it is not as elegant as hotplugging. At least it only loads modules that correspond to the hardware. Thank you for your help. This discussion has been useful to me. Peace, James ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: hotplug and modalias handling (WAS: Re: RFD: Rework/extending functionality of mdev)
On Thu, Mar 19, 2015 at 06:02:33PM -0600, James Bowlin wrote: On Wed, Mar 18, 2015 at 02:48 AM, Isaac Dunham said: Is this manifested as the root device never shows up? Yes, although we call it the boot device. As the one who probably posted this, I can comment further: I've heard of one computer where this was an issue, a couple years ago; a number of people on the Puppy Linux forums were experimenting with mdev, and one reported that the modalias file was missing with a Broadcom wireless card. That is interesting. Do you think this could have been due to a bug in the broadcom driver? We don't do any networking in our initrd/initramfs so maybe we can get by with the faster methods that use modalias files instead of uevent files. The broken hotplugging (or whatever the problem is) is a bigger issue for us now even though it is small compared to the problems caused by the broken modprobe a year ago. However, you may find it worth noting that find /sys will get its list of files slightly later than globbing will. I don't know what the globbing solution is that you refer to. Sorry, your reference to /sys/devices/ made me semi-remember this bit (again, from the Puppy Linux forums): grep -h ^MODALIAS= /sys/bus/*/devices/*/uevent |cut -c 10- See: http://www.murga-linux.com/puppy/viewtopic.php?t=78941start=210 took me a little while to understand the following solution from the alpinelinux initrd/initramfs, partly because it does not work here at all: find /sys -name modalias | xargs sort -u | xargs modprobe -a It fails badly whenever one or more modalias files has a space in the path (which is the case here). That is easily remedied with: find /sys -name modalias -print0 \ | xargs -0 sort -u \ | xargs modprobe -a -q -b 2/dev/null This is probably more efficient than what I'm currently doing if /sys is replaced with /sys/devices. Thank you. Ah. Where SYSBASE is the directory you're searching in, you might prefer to use: find $SYSBASE -name modalias -exec sort -u '{}' + | xargs modprobe ... (ie, use find -exec instead of find -0 |xargs -0). FYI: I find that there are uevents in /sys/bus/ and /sys/module/, but no modalias files. snip My latest solution is to coldplug inside the loop that looks for the boot device. When we were experiencing the bug in the smaller busybox modprobe about a year ago we tried various schemes of always loading certain modules but that was not very satisfactory. It masks the problem instead of fixing it and it. If loading of hardware specific modules is not 100% reliable then where do you draw the line of which specific modules to load on every machine. Likewise, loading a bunch of modules after a delay can in some cases just further postpone the eventual failure. That sounds like a good course. ISTM repeated coldplugging is a reasonable compromise even if it is not as elegant as hotplugging. At least it only loads modules that correspond to the hardware. Thank you for your help. This discussion has been useful to me. Glad to be of help. Thanks, Isaac Dunham ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
Le 18/03/2015 18:41, Laurent Bercot a écrit : On 18/03/2015 18:08, Didier Kryn wrote: No, you must write to the pipe to detect it is broken. And you won't try to write before you've got an event from the netlink. This event will be lost. I skim over that discussion (because I don't agree with the design) so I can't make any substantial comments, but here's a nitpick: if you use an asynchronous event loop, your selector triggers - POLLHUP for poll(), not sure if it's writability or exception for select()- as soon as a pipe is broken. Hi Laurent. My experience is select() doesn't ever give you anything from exception, on Linux. And fifosvd must close the write end of the pipe; therefore cannot poll for writeability. Didier ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
Le 18/03/2015 20:01, Harald Becker a écrit : Do you think it matters losing one more event? Here we are considering the case when fifosvd is killed (say by admin's error). I understand lost events can be recovered. However there is one distinctive advantage in detecting immediately the death of fifosvd: nldev can die immediately, causing a new working chain to be estabished immediately, *before* a possible burst of events. This avoids forking 3 daemons just when the next event happens. The necessary code in nldev consists only in invoking sigaction() with a trivial intercept function and testing some flag on return from any blocked state (poll/read/write). Note that you probably already want this sigaction and intercept to capture SIGTERM. This is fine as long as the netlink reader keeps control on its exit, not if it's killed. And when netlink is killed, the it is the responsibility of the higher instance to bring required stuff up again. Sure, we agreed on that. But living orphans should not be left behind, and, in this respect, it is nldev which is in charge of fifosvd; the higher instance can't do it. This netlink reader you describe is not the general tool we were considering up to now, the simple data funnel. My pseudo code described the principal operation and data flow, not every glory detail of failure management. So the here described netlink is what I last called t(netlink the Unix way). If the idea is to integrate such peculiarities as execing a script, then it is not the general tool and why not integrate as well the supervision of mdev-i instead of needing fifosvd. The reason for fifosvd was AFAIU to associate general tools, nldev and mdev-i. ??? Don't know if I fully understand you here. And why shall exec a failure script violate making netlink a general tool? consider: nldev -e /path/to/failure/script I must say two things: First I didn't understand correctly what you had written and didn't apreciate the -e option. Second, I don't know what you include in the failure management, but I think part of it should be to get rid of the child. Doing it is going to be complicated in the script; at least you need to pass the pid because it is unknown to the shell. Instead, it is pretty simple in nldev: you just need to invoke wait() and syslog the exit status. The purpose of wait() isn't to check the pid of the process - we know who it is -, it's to remove the zombie, and get its exit status. This logic is no harm, whatever the way nldev is invoked. Even if it hasn't inherited a child, wait() returns immediately. I agree, though, that a comprenensive parsing of the status would take some lines of code. With may be a default of /sbin/nldev-fail. Maybe with a default behaviour of not execing anything - this option must be provided in some way. I skip the rest of the discussion because I would repeat the same things :-) And we agree that fifosvd can know the pipe is broken from the return code of the handler, and it's enough to have one way to know it. Didier ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: hotplug and modalias handling (WAS: Re: RFD: Rework/extending functionality of mdev)
On Tue, 17 Mar 2015 17:51:39 -0600 James Bowlin bit...@gmail.com wrote: On Mon, Mar 16, 2015 at 09:55 AM, Natanael Copa said: On Fri, 13 Mar 2015 13:12:56 -0600 James Bowlin bit...@gmail.com wrote: TL;DR: the current busybox (or my code) seems to be broken on one or two systems out of thousands; crucial modules don't get loaded using hotplug + coldplug. Please give me a runtime option so I can give my users a boot-time option to try the new approach. ... My current plan is to make repeated coldplugging the default method for loading modules since in every case I've been able to investigate, an alias for the missing module(s) is in the output of the find command I gave above. I haven't yet done exhaustive testing but every test with repeated coldplugging has worked even on the system where the hotplugging is flaky (and which is now out of my reach). Interstingly, this is what we do on alpine linux too: http://git.alpinelinux.org/cgit/mkinitfs/tree/initramfs-init.in#n530 So apparently we must have had the same problem in the past. My current plan is to user netlink to serialize the hotplug events and instead of scanning /sys/devices for modailas entries, I'll trigger 'add' uevents. (I will also make the netlink listener stop as soon as it has collected all needed bits to set up the root fs) I don't know why hotplugging works most of the time but not all the time. It could be a race condition. I wonder if the problem persist if you serialize mdev using /dev/mdev.seq ISTM as long as I start the hotplugging before I do the first (and used to be only) coldplug, there is not a lot I can mess up. Another check is that with hotplugging disabled and only a single coldplug then there are very few (if any?) situations where it will boot since it usually takes a few seconds for the boot device to show up and that first cold plug is done ASAP. Peace, James ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
Le 17/03/2015 19:56, Harald Becker a écrit : Hi Didier, On 17.03.2015 19:00, Didier Kryn wrote: The common practice of daemons to put themselves in background and orphan themself is starting to become disaproved by many designers. I tend to share this opinion. If such a behaviour is desired, it may well be done in the script (nohup), and the go to background feature be completely removed from the daemon proper. The idea behind this change is to allow for supervisor not being process #1. Ack, for the case the daemon does not allow to be used with an external supervisor. Invoking a daemon from scripts is no problem, but did you ever come in a situation, where you needed to maintain a system by hand? Therefor I personally vote for having a simple command doing auto background of the daemon, allowing to run from a supervisor, by a simple extra parameter (e.g. -n). Which is usually no problem, as the supervisor need any kind of configuration, where you should be able to add the arguments, the daemon gets started with. So you have to enter that parameter just once for your usage from supervisor, but save extra parameters for manual invocation. Long lived daemons should have both startup methods, selectable by a parameter, so you make nobodies work more difficult than required. Dropping the auto background feature, would mean, saving a single function call to fork and may be an exit. This will result in a savage of roughly around 10 to 40 Byte of the binary (typical x86 32 bit). To much cost to allow both usages? OK, I think you are right, because it is a little more than a fork: you want to detach from the controlling terminal and start a new session. I agree that it is a pain to do it by hand and it is OK if there is a command-line switch to avoid all of it. But there must be this switch. Could you clarify, please: do you mean implementing in netlink the logic to restart fifosvd? Previously you described it as just a data funnel. No, restart is not required, as netlink dies, when fifosvd dies (or later on when the handler dies), the supervisor watching netlink may then fire up a new netlink reader (possibly after failure management), where this startup is always done through a central startup command (e.g. xdev). The supervisor, never starts up the netlink reader directly, but watches the process it starts up for xdev. xdev does it's initial action (startup code) then chains (exec) to the netlink reader. This may look ugly and unnecessary complicated at the first glance, but is a known practical trick to drop some memory resources not needed by the long lived daemon, but required by the start up code. For the supervisor instance this looks like a single process, it has started and it may watch until it exits. So from that view it looks, as if netlink has created the pipe and started the fifosvd, but in fact this is done by the startup code (difference between flow of operation and technical placing the code). I didn't notice this trick in your description. It is making more and more sense :-). Now look, since nldev (lest's call it by its name) is execed by xdev, it remains the parent of fifosvd, and therefore it shall receive the SIGCLD if fifosvd dies. This is the best way for nldev to watch fifosvd. Otherwise it should wait until it receives an event from the netlink and tries to write it to the pipe, hence loosing the event and the possible burst following it. nldev must die on SIGCLD (after piping available events, though); this is the only supervision logic it must implement, but I think it is critical. And it is the same if nldev is launched with a long-lived mdev-i without a fifosvd. Well, this is what I thought, but the manual says an empty end causes end-of file, not mentionning the pipe being empty. end-of-file always include the pipe being empty. Consider a pipe which has still some data in it, when the writer closes the write-end. If the reader would receive eof before all data has bean consumed, it would lose some data. That would be absolutely unreliable. Therefore, the eof is only forwarded to the read end, when the pipe is empty. I agree that the other way wouldn't work. Just noticing the manual is wrong/unclear on that point. *Does anybody know the exact specification of poll behavior on this case?* My experience, with select() which is roughly the same, is that it does not detect EOF. And, since fifosvd must not read the pipe, how does it detect that it is broken? Not detect? Sure you closed all open file descriptors for the write end (a common cave-eat)? I have never bean hit by such a case, except anyone forgot to close all file descriptors of the write end. You notice that something happened on input (AFAIR) but I'm sure you don't know what. It may be data as well. You must read() to know. Anyway you don't want to poll() the pipe unless mdev-i is dead
Re: RFD: Rework/extending functionality of mdev
On 18.03.2015 10:42, Didier Kryn wrote: Long lived daemons should have both startup methods, selectable by a parameter, so you make nobodies work more difficult than required. OK, I think you are right, because it is a little more than a fork: you want to detach from the controlling terminal and start a new session. I agree that it is a pain to do it by hand and it is OK if there is a command-line switch to avoid all of it. But there must be this switch. Ack! No, restart is not required, as netlink dies, when fifosvd dies (or later on when the handler dies), the supervisor watching netlink may then fire up a new netlink reader (possibly after failure management), where this startup is always done through a central startup command (e.g. xdev). The supervisor, never starts up the netlink reader directly, but watches the process it starts up for xdev. xdev does it's initial action (startup code) then chains (exec) to the netlink reader. This may look ugly and unnecessary complicated at the first glance, but is a known practical trick to drop some memory resources not needed by the long lived daemon, but required by the start up code. For the supervisor instance this looks like a single process, it has started and it may watch until it exits. So from that view it looks, as if netlink has created the pipe and started the fifosvd, but in fact this is done by the startup code (difference between flow of operation and technical placing the code). I didn't notice this trick in your description. It is making more and more sense :-). I left it out, to make it not unnecessary complicated, and I wanted to focus on the netlink / pipe operation. Now look, since nldev (lest's call it by its name) is execed by xdev, it remains the parent of fifosvd, and therefore it shall receive the SIGCLD if fifosvd dies. This is the best way for nldev to watch fifosvd. Otherwise it should wait until it receives an event from the netlink and tries to write it to the pipe, hence loosing the event and the possible burst following it. nldev must die on SIGCLD (after piping available events, though); this is the only supervision logic it must implement, but I think it is critical. And it is the same if nldev is launched with a long-lived mdev-i without a fifosvd. netlink reader (nldev) does not need to explicitly watch the fifosvd by SIGCHLD. Either that piece of code does it's job, or it fails and dies. When fifosvd dies, the read end of the pipe is closed (by kernel), except there is still a handler process (which shall process remaining events from the pipe). As soon as there is neither a fifosvd, nor a handler process, the pipe is shut down by the kernel, and nldev get error when writing to the pipe, so it knows the other end died. You won't gain much benefit from watching SIGCHLD and reading the process status. It either will give you the information, fifosvd process is still running, or it died (failed). The same information you get from the write to the pipe, when the read end died, you get EPIPE. Limiting the time, nldev tries to write to the pipe, would although allow to detect stuck operation of fifosvd / handler (won't be given by SIGCHLD watching) ... but (in parallel I discussed that with Laurent), the question is, how to react, when write to the pipe stuck (but no failure)? We can't do much here, and are in trouble either, but Laurent gave the argument: The netlink socket also contain a buffer, which may hold additional events, so we do not loss them, in case processing continues normally. When the kernel buffer fills up to it's limit, let the kernel react to the problem. ... otherwise you are right, nldev's job is to detect failure of the rest of the chain (that is supervise those), and has to react on this. The details of taken actions in this case, need and can be discussed (and may be later adapted), without much impact on other operation. This clearly means, I'm open for suggestions, which kind of failure handling shall be done. Every action taken, to improve reaction, which is of benefit for the major purpose of the netlink reader, without blowing this up needlessly, is of interest (hold in mind: long lived daemon, trying to keep it simple and small). My suggestion is: Let the netlink reader detect relevant errors, and exec (not spawn) a script of given name, when there are failures. This is small, and gives the invoked script full control on the failure management (no fixed functionality in a binary). When done, it can either die, letting a higher instance doing the job to restart, or exec back and re-start the hotplug system (may be with a different mechanism). When the script does not exist, the default action is to exit the netlink reader process unsuccessful, giving a higher instance a failure indication and the possibility to react on it. Not detect? Sure you closed all open file descriptors for the write end (a common cave-eat)?
Re: RFD: Rework/extending functionality of mdev
On 18/03/2015 18:08, Didier Kryn wrote: No, you must write to the pipe to detect it is broken. And you won't try to write before you've got an event from the netlink. This event will be lost. I skim over that discussion (because I don't agree with the design) so I can't make any substantial comments, but here's a nitpick: if you use an asynchronous event loop, your selector triggers - POLLHUP for poll(), not sure if it's writability or exception for select()- as soon as a pipe is broken. Note that events can still be lost, because the pipe can be broken while you're reading a message from the netlink, before you come back to the selector; so the message you just read cannot be sent. But that is a risk you have to take everytime you perform buffered IO, there's no way around it. -- Laurent ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
Hi Laurent ! Note that events can still be lost, because the pipe can be broken while you're reading a message from the netlink, before you come back to the selector; so the message you just read cannot be sent. But that is a risk you have to take everytime you perform buffered IO, there's no way around it. To make clear, about what case we talk, that is: - spawn conf parser / device operation process - exit with failure - re-spawn conf parser / device operation process - exit with failure - re-spawn conf parser / device operation process - exit with failure - ... - detect failure loop - spawn failure script - exit with failure or not zero status - giving up, close read end of pipe - let fifosvd die @Laurent: What would you do in that case? Endless respawn? - shrek! -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
Hi Laurent ! On 18/03/2015 18:08, Didier Kryn wrote: No, you must write to the pipe to detect it is broken. And you won't try to write before you've got an event from the netlink. This event will be lost. On 18.03.2015 18:41, Laurent Bercot wrote: I skim over that discussion (because I don't agree with the design) Why? Did you note my last two alternatives, unexpectedly both named #3? ... but specifically the last one Netlink the Unix way? - uses private pipe for netlink and named pipe for hotplug helper (with maximum of code sharing) - should most likely do the flow of operation as you suggested (as far I did understand you) - except, I split of the pipe watcher / on demand startup code of the conf parser / device operation into it's own thread (process), for general code usability as a different applet for on demand pipe consumer startup purposes (you had that function as integral part of your netlink reader) - and I'm currently going to split of that one-shot xdev init feature from the xdev, creating an own applet / command for this, as you suggested (extending functionality for even more general usage, as suggested by Isaac, independent from the device management, and maybe modifiable in it's operation by changing functions in a shell script) So why do you still doubt about the design? ... because I moved some code into it's own (small) helper thread? I can't make any substantial comments, but here's a nitpick: if you use an asynchronous event loop, your selector triggers - POLLHUP for poll(), not sure if it's writability or exception for select()- as soon as a pipe is broken. This is what I expected, but the problem is, the question for this arrived, and I can't find the location where this is documented. Note that events can still be lost, because the pipe can be broken while you're reading a message from the netlink, before you come back to the selector; so the message you just read cannot be sent. But that is a risk you have to take everytime you perform buffered IO, there's no way around it. Ok, what would you then do? Unbuffered I/O on the pipe, and then what? ... if that single one more message dropped, except the others not read from netlink buffer (to be lost on close), matters, then we shall indeed use unbuffered I/O on the pipe, and only read a message, when there is room for one more one more message in the pipe: set non blocking I/O on stdout establish netlink socket loop: poll for write on stdout possible, until available (may set an upper timeout limit, failure on timeout) poll for netlink read and still for write on stdout if write ability drops we are in serious trouble, failure if netlink read possible gather message from netlink write message to stdout (should never block) on EAGAIN, EINTR: do 3 write retries, then failure ... does that fit better? I don't think that it makes a big difference, but I can live with the slight bigger code. My problem is not the detection of the failing pipe write, but the reaction on it. When that happen, the down chain of the pipe most likely need more than just a restart. That is, it should only happen on serious failure in the conf file or the device operations (- manual action required). So I expect more loss of event messages, than just that single one message, you were grumbling about. Hence on hotplug restart we need to re-trigger the plug events, nevertheless! -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
Le 18/03/2015 13:34, Harald Becker a écrit : On 18.03.2015 10:42, Didier Kryn wrote: Long lived daemons should have both startup methods, selectable by a parameter, so you make nobodies work more difficult than required. OK, I think you are right, because it is a little more than a fork: you want to detach from the controlling terminal and start a new session. I agree that it is a pain to do it by hand and it is OK if there is a command-line switch to avoid all of it. But there must be this switch. Ack! No, restart is not required, as netlink dies, when fifosvd dies (or later on when the handler dies), the supervisor watching netlink may then fire up a new netlink reader (possibly after failure management), where this startup is always done through a central startup command (e.g. xdev). The supervisor, never starts up the netlink reader directly, but watches the process it starts up for xdev. xdev does it's initial action (startup code) then chains (exec) to the netlink reader. This may look ugly and unnecessary complicated at the first glance, but is a known practical trick to drop some memory resources not needed by the long lived daemon, but required by the start up code. For the supervisor instance this looks like a single process, it has started and it may watch until it exits. So from that view it looks, as if netlink has created the pipe and started the fifosvd, but in fact this is done by the startup code (difference between flow of operation and technical placing the code). I didn't notice this trick in your description. It is making more and more sense :-). I left it out, to make it not unnecessary complicated, and I wanted to focus on the netlink / pipe operation. Now look, since nldev (lest's call it by its name) is execed by xdev, it remains the parent of fifosvd, and therefore it shall receive the SIGCLD if fifosvd dies. This is the best way for nldev to watch fifosvd. Otherwise it should wait until it receives an event from the netlink and tries to write it to the pipe, hence loosing the event and the possible burst following it. nldev must die on SIGCLD (after piping available events, though); this is the only supervision logic it must implement, but I think it is critical. And it is the same if nldev is launched with a long-lived mdev-i without a fifosvd. netlink reader (nldev) does not need to explicitly watch the fifosvd by SIGCHLD. Either that piece of code does it's job, or it fails and dies. When fifosvd dies, the read end of the pipe is closed (by kernel), except there is still a handler process (which shall process remaining events from the pipe). As soon as there is neither a fifosvd, nor a handler process, the pipe is shut down by the kernel, and nldev get error when writing to the pipe, so it knows the other end died. No, you must write to the pipe to detect it is broken. And you won't try to write before you've got an event from the netlink. This event will be lost. You won't gain much benefit from watching SIGCHLD and reading the process status. It either will give you the information, fifosvd process is still running, or it died (failed). The same information you get from the write to the pipe, when the read end died, you get EPIPE. You get the information immediately from SIGCLD. You get it too late from the pipe, and you loose at least one event for sure, a whole burst if there is. Limiting the time, nldev tries to write to the pipe, would although allow to detect stuck operation of fifosvd / handler (won't be given by SIGCHLD watching) ... but (in parallel I discussed that with Laurent), the question is, how to react, when write to the pipe stuck (but no failure)? We can't do much here, and are in trouble either, but Laurent gave the argument: The netlink socket also contain a buffer, which may hold additional events, so we do not loss them, in case processing continues normally. When the kernel buffer fills up to it's limit, let the kernel react to the problem. Sure, the limit here is pipe size (adjustable) + netlink buffer size. ... otherwise you are right, nldev's job is to detect failure of the rest of the chain (that is supervise those), and has to react on this. The details of taken actions in this case, need and can be discussed (and may be later adapted), without much impact on other operation. This clearly means, I'm open for suggestions, which kind of failure handling shall be done. Every action taken, to improve reaction, which is of benefit for the major purpose of the netlink reader, without blowing this up needlessly, is of interest (hold in mind: long lived daemon, trying to keep it simple and small). My suggestion is: Let the netlink reader detect relevant errors, and exec (not spawn) a script of given name, when there are failures. This is small, and gives the invoked script full control on the failure management (no fixed functionality in a binary). When
Re: RFD: Rework/extending functionality of mdev
Hi Didier, On 17.03.2015 19:00, Didier Kryn wrote: The common practice of daemons to put themselves in background and orphan themself is starting to become disaproved by many designers. I tend to share this opinion. If such a behaviour is desired, it may well be done in the script (nohup), and the go to background feature be completely removed from the daemon proper. The idea behind this change is to allow for supervisor not being process #1. Ack, for the case the daemon does not allow to be used with an external supervisor. Invoking a daemon from scripts is no problem, but did you ever come in a situation, where you needed to maintain a system by hand? Therefor I personally vote for having a simple command doing auto background of the daemon, allowing to run from a supervisor, by a simple extra parameter (e.g. -n). Which is usually no problem, as the supervisor need any kind of configuration, where you should be able to add the arguments, the daemon gets started with. So you have to enter that parameter just once for your usage from supervisor, but save extra parameters for manual invocation. Long lived daemons should have both startup methods, selectable by a parameter, so you make nobodies work more difficult than required. Dropping the auto background feature, would mean, saving a single function call to fork and may be an exit. This will result in a savage of roughly around 10 to 40 Byte of the binary (typical x86 32 bit). To much cost to allow both usages? Could you clarify, please: do you mean implementing in netlink the logic to restart fifosvd? Previously you described it as just a data funnel. No, restart is not required, as netlink dies, when fifosvd dies (or later on when the handler dies), the supervisor watching netlink may then fire up a new netlink reader (possibly after failure management), where this startup is always done through a central startup command (e.g. xdev). The supervisor, never starts up the netlink reader directly, but watches the process it starts up for xdev. xdev does it's initial action (startup code) then chains (exec) to the netlink reader. This may look ugly and unnecessary complicated at the first glance, but is a known practical trick to drop some memory resources not needed by the long lived daemon, but required by the start up code. For the supervisor instance this looks like a single process, it has started and it may watch until it exits. So from that view it looks, as if netlink has created the pipe and started the fifosvd, but in fact this is done by the startup code (difference between flow of operation and technical placing the code). Well, this is what I thought, but the manual says an empty end causes end-of file, not mentionning the pipe being empty. end-of-file always include the pipe being empty. Consider a pipe which has still some data in it, when the writer closes the write-end. If the reader would receive eof before all data has bean consumed, it would lose some data. That would be absolutely unreliable. Therefore, the eof is only forwarded to the read end, when the pipe is empty. *Does anybody know the exact specification of poll behavior on this case?* My experience, with select() which is roughly the same, is that it does not detect EOF. And, since fifosvd must not read the pipe, how does it detect that it is broken? Not detect? Sure you closed all open file descriptors for the write end (a common cave-eat)? I have never bean hit by such a case, except anyone forgot to close all file descriptors of the write end. No, they should still be processed by the handler, which then stumbles on the eof, when all event messages are read. See above. It would make sense but the manual does not tell that. I bet the manual is wrong, in this case. It's the working practice of pipes in the Unix world, may be the specification of this goes back to KR in the 1970th. PS. I inadvertently went out of the list. Just my habit to click reply. I leave it up to you to go back to the list. I tried to set a CC to the list, but got a response the message has bean set to hold, but I try doing it vice versa. -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: hotplug and modalias handling (WAS: Re: RFD: Rework/extending functionality of mdev)
On Tue, Mar 17, 2015 at 05:51:39PM -0600, James Bowlin wrote: On Mon, Mar 16, 2015 at 09:55 AM, Natanael Copa said: On Fri, 13 Mar 2015 13:12:56 -0600 James Bowlin bit...@gmail.com wrote: TL;DR: the current busybox (or my code) seems to be broken on one or two systems out of thousands; crucial modules don't get loaded using hotplug + coldplug. Please give me a runtime option so I can give my users a boot-time option to try the new approach. Is this manifested as the root device never shows up? Do you use mdev as kernel helper? Yes. We specifically use it to load modules associated with newly discovered hardware when our live system first boots. This is working 99+% of the time just fine. Alpine linux initramfs currently does: find /sys -name modalias | xargs sort -u | xargs modprobe -a using busybox modprobe. That expression doesn't look right for several reasons. About six weeks ago I thought a suggestion from the following alpinelinux post would solve my problems: http://article.gmane.org/gmane.linux.distributions.alpine.devel/2791 Or it could be the kernel creating the uevent but not the modalias file--which is a known issue with some hardware; the workaround is to use something like this: find /sys -name uevent -exec grep -h MODALIAS= '{}' + \ | sort -u | cut -c 10- | xargs modprobe -a As the one who probably posted this, I can comment further: I've heard of one computer where this was an issue, a couple years ago; a number of people on the Puppy Linux forums were experimenting with mdev, and one reported that the modalias file was missing with a Broadcom wireless card. But that wasn't the answer. The find -exec sed code I gave previously is a streamlined version of this. As I said, it made no difference. Also, I was curious if you found that using /sys was ever better than using /sys/devices. It is certainly slower, but I haven't seen any systems where scanning /sys/ gives any more information than scanning /sys/devices after passing the aliases through sort -u. It didn't make a difference on two I'm not aware of any cases where /sys/devices was problematic, and that was another method that was recommended on the Puppy Linux forums. However, you may find it worth noting that find /sys will get its list of files slightly later than globbing will. I suspected one system had a problem loading the sd-mod module even though the alias for this module showed up in the output of: find /sys/devices -name modalias -exec cat '{}' + on their system. One strange thing that makes me wonder if busybox snip I'm not even sure sd-mod was the missing module. It may well have been the usb-storage module which I know was was missing (not loaded) on a 2nd system with problems. In the past few days I've got my hands on a system where depending on the hardware (usb device used and usb slot used) the usb-storage module sometimes does not get loaded which means the boot device does not show up. Do you wait for the boot device to show up? I know the kernel has a rootwait parameter for similar issues with usb drives. Of course, the root device may be unknown depending on what you're doing, which would make waiting properly rather difficult. (What I'd be inclined to do is check for the root and if it fails, coldplug a second time/blindly load sd_mod and usb_storage) My current plan is to make repeated coldplugging the default method for loading modules since in every case I've been able to investigate, an alias for the missing module(s) is in the output of the find command I gave above. I haven't yet done exhaustive testing but every test with repeated coldplugging has worked even on the system where the hotplugging is flaky (and which is now out of my reach). HTH, Isaac Dunham ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
Le 16/03/2015 19:18, Harald Becker a écrit : On 16.03.2015 10:15, Didier Kryn wrote: 4) netlink reader the Unix way Why let our netlink reader bother about where he sends the event messages. Just let him do his netlink receiption job and forward the messages to stdout. netlink reader: set stdout to non blocking I/O establish netlink socket wait for event messages gather event information write message to stdout hotplug startup code: create a private pipe spawn netlink reader, redirect stdout to write end of pipe spawn fifosvd - xdev parser, redirect stdin from read end of pipe close both pipe ends (write end open in netlink, read in fifosvd) 1) why not let fifosvd act as the startup code? It is anyway the supervisor of processes at both ends of the pipe and in charge of re-spawning them in case they die. Netlink receiver should be restarted immediately to not miss events, while event handler should be restarted on event (see comment below). This would make the fifosvd specific to the netlink / hotplug function. My intention is, to get a general usable tool. I had not caught the point that you wanted it a general purpose tool - sorry. You won't gain something otherwise, as the startup of the daemon has to be done nevertheless. It does not matter if you start fifosvd, and then it forks again bringing it into background, and then fork again to start the netlink part, or do it slight different, start an inital code snipped, that do the pipe creation and the forks (starting the daemons in background), then step away. This is same operation only moved a bit around, but may be not blocking other usages. Sure it is the same. My point was about supervision. The netlink reader is a long lived daemon. It shall not exit, and handle failures internally where possible, but if it fails, pure restarting without intervening other action to control / correct the failure reason, doesn't look as a good choice. So it needs any higher instance to handle this, normally init or a different system supervisor program (e.g. inittab respawn action). OK, then this higher instance cannot be an ordinary supervisor, because it must watch two intimely related processes and re-spawn both if one of them dies. Hence, it is yet another application. This is why I thought fifosvd was a good candidate to do that. Also because it already contains some supervision logic to manage the operation handler. So, if fifosvd is a general usable tool, it must come with a companion general usable tool, let's call it fifosvdsvd, designed to monitor pairs of pipe-connected daemons. Where as the device operation handler (including conf parser) is started on demand, when incoming events require this. The job of the fifosvd is this on demand pipe handling, including failure management. 2) fifosvd would never close any end of the pipe because it could need them to re-spawn any of the other processes. Like this, no need for a named pipe as long as fifosvd lives. Dit you look at my pseudo code? It does *not* use a named pipe (fifo) for netlink operation, but a normal private pipe (so pipesvd may fit better it's purpose). Where as hotplug helper mechanism won't work this way, and require a named pipe (different setup, by just doing slight different startup). Yes, but it cannot work if the two long-lived daemons are supervised by an ordinary supervisor. Because one end of the pipe is lost if one of the processes die, and this kind of supervisor will restart only the one which died. And I have a suggestion for simplicity: Let be the timeout/no-timeout feature be a parameter only for the event handler; it does not need to change the behaviour of fifosvd. I think it is fine to restart event handler on-event even when it dies unexpectedly. ??? At some point you considered that the operation handler might be either long-lived or dieing on timeout. I suggest that the supervision logic is identical in the two cases. Didier ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On Mon, 16 Mar 2015 18:45:26 +0100 Harald Becker ra...@gmx.de wrote: On 16.03.2015 09:19, Natanael Copa wrote: I am only aware of reading kernel events from netlink or kernel hotplug helper. Where as I'm trying to create a modular system, which allows *you* to setup a netlink usage, *the next* to setup hotplug helper usage (still with speed improvement, not old behavior), and ... What is this new, third plug mechanism? I think that is the piece I am missing to understand why fifo manager approach would be superior. ... the *ability* to setup a system, with a different plug mechanism, not yet mentioned, using same modular system. Just putting together the functional blocks, the system maintainer decides. Does this not yet mentioned plug mechanism exist? Think of, for simplicity, about doing the event gathering from sys file system with some shell script code, then forward the device event message to rest of system. Looks ugly? Looks ugly, yes. What about older or small systems without hotplug feature? Does it exist systems that are so old that they lack hotplug - but at same time are new enough to have sysfs? I suppose it would make sense for kernels without CONFIG_HOTPLUG but I would expect such systems use highly customized/specialized tools rather than general purpose tools. My intention is *not* to *solve your needs*, it is to give *you* the *tools to build* the system with *your intended functionality*, by putting together some commands or command parameters, without writing code (programs). At the only expense of some (possibly) dead code in the binary. Where dead code means, dead for you, but be used by others who want to setup there system in a different way (build your own BB version and opt out, if you dislike). We have different goals so I will likely not use your tools. I want a tool for hotplug that avoids dead code. Thanks for your patience and thanks for describing it with few words. I think I finally got it. -nc ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 16.03.2015 09:19, Natanael Copa wrote: I am only aware of reading kernel events from netlink or kernel hotplug helper. Where as I'm trying to create a modular system, which allows *you* to setup a netlink usage, *the next* to setup hotplug helper usage (still with speed improvement, not old behavior), and ... What is this new, third plug mechanism? I think that is the piece I am missing to understand why fifo manager approach would be superior. ... the *ability* to setup a system, with a different plug mechanism, not yet mentioned, using same modular system. Just putting together the functional blocks, the system maintainer decides. Think of, for simplicity, about doing the event gathering from sys file system with some shell script code, then forward the device event message to rest of system. Looks ugly? What about older or small systems without hotplug feature? My intention is *not* to *solve your needs*, it is to give *you* the *tools to build* the system with *your intended functionality*, by putting together some commands or command parameters, without writing code (programs). At the only expense of some (possibly) dead code in the binary. Where dead code means, dead for you, but be used by others who want to setup there system in a different way (build your own BB version and opt out, if you dislike). ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 16.03.2015 21:30, Natanael Copa wrote: Does it exist systems that are so old that they lack hotplug - but at same time are new enough to have sysfs? Oh, yes! Mainly embedded world. I suppose it would make sense for kernels without CONFIG_HOTPLUG but I would expect such systems use highly customized/specialized tools rather than general purpose tools. You miss to think about all those embedded device which build there minimal System around BB, plus some application specific programs. Such systems do only use BB tools for the system setup, and what if they want to use a specialized plug event gatherer (however that does work)? We have different goals so I will likely not use your tools. I want a tool for hotplug that avoids dead code. Then build your own BB version and disable those mechanism in the config you do not use. Then there would be no dead code ... but you are free to use whichever tool you like. Thanks for your patience and thanks for describing it with few words. I think I finally got it. Your decision, but I still think, you don't do something different, except moving around some of the code, without gaining any real benefit, but the expense of blocking those who like to setup there system in a different way ... ough ... I won't like to use your tools, too! [For completeness: When you like to know, why I think your design is not the right way, see what Laurent told you about this.] -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 16.03.2015 22:25, Didier Kryn wrote: I had not caught the point that you wanted it a general purpose tool - sorry. It's a lengthy and complex discussion, not very difficult to miss something, so no trouble. Please ask, when there are questions. The netlink reader is a long lived daemon. It shall not exit, and handle failures internally where possible, but if it fails, pure restarting without intervening other action to control / correct the failure reason, doesn't look as a good choice. So it needs any higher instance to handle this, normally init or a different system supervisor program (e.g. inittab respawn action). OK, then this higher instance cannot be an ordinary supervisor, because it must watch two intimely related processes and re-spawn both if one of them dies. Hence, it is yet another application. This is why I thought fifosvd was a good candidate to do that. Also because it already contains some supervision logic to manage the operation handler. Supervision is a system depended function, which differs on the philosophy the init process is working and handles such things. So before we are talking about which supervision, we need to tell, which type of supervision you are using, that is mainly which init system you use. So, if fifosvd is a general usable tool, it must come with a companion general usable tool, let's call it fifosvdsvd, designed to monitor pairs of pipe-connected daemons. A pipe is an unidirectional thing. Writing a program, sitting at the read end of a pipe, to watch the other side is logical mixing of functions, but ... Where as the device operation handler (including conf parser) is started on demand, when incoming events require this. The job of the fifosvd is this on demand pipe handling, including failure management. 2) fifosvd would never close any end of the pipe because it could need them to re-spawn any of the other processes. Like this, no need for a named pipe as long as fifosvd lives. Dit you look at my pseudo code? It does *not* use a named pipe (fifo) for netlink operation, but a normal private pipe (so pipesvd may fit better it's purpose). Where as hotplug helper mechanism won't work this way, and require a named pipe (different setup, by just doing slight different startup). Yes, but it cannot work if the two long-lived daemons are supervised by an ordinary supervisor. Because one end of the pipe is lost if one of the processes die, and this kind of supervisor will restart only the one which died. ... you are wrong. When the netlink process dies, the write end of the pipe is automatically closed by the kernel. This let at first the handler process detect, end-of-file when waiting for more messages, so that process exits. fifosvd then checks the pipes and gets an error, telling the pipe has shutdown on the write end, so fifosvd does the expected thing, it exits too. Even if that exit may be delayed somewhat, it does not matter, when the higher instance respawns the hotplug system due to the netlink exit. The new pipe will be established in parallel, while the old pipe (including processes) vanish after some small amount of time. That is your supervision chain is slight different: - netlink is supervised by the higher instance, but itself watches for failures on the pipe (in case the read end dies unexpectedly) - the supervision of the pipe read side is a bit complexer, as we use an on demand handler process, so we have two different cases: a handler process is currently active or not: - when no handler process is active, fifosvd detects a pipe failure of the write end immediately and just exit. So there is no need of supervision, only some resources have to be freed - when there is an active handler process, this process is supervised by fifosvd, but itself checks for eof on the pipe, and exit. Meanwhile waits fifosvd for the exit of the handler process and checks the exit status (if successfull or any failure). Nevertheless which way fifosvd takes, at the end it detects, the write end of the pipe has gone and takes his hat. so supervision chain is: init - netlink - fifosvd - handler At some point you considered that the operation handler might be either long-lived or dieing on timeout. I suggest that the supervision logic is identical in the two cases. That was an alternative in the discussion, to show how I got to my solution and picked up a solution Laurent mentioned. So the alternatives, show the steps of improving my approach has gone. I highly prefer this last one (netlink reader the Unix way). It is the version with the most flexibility, and even addresses the wish, to use a private pipe and not a named pipe for netlink operation, without adding extra overhead for that possibility. Indeed are the alternatives very similar, do the same principal operation, but move around some code a bit, for different purposes, to see which impact each
Re: RFD: Rework/extending functionality of mdev
On Sun, 15 Mar 2015 01:45:05 +0100 Harald Becker ra...@gmx.de wrote: On 14.03.2015 03:40, Laurent Bercot wrote: Please take a look at my s6-uevent-listener/s6-uevent-spawner, or at Natanael's nldev/nldev-handler. The long-lived uevent handler idea is a *solved problem*. I know how that works, and this is the problem. I see limitations of this approach, which I try to overcome. 1) using as netlink mechanism only - no problem 2) using with kernel hotplug helper mechanism - fails to use, or still suffers from re-parsing conf for each event. 3) Open up my mind and accept that next one coming around, may have a brand new plug mechanism in his bag - may be difficult to do without changing code. I am only aware of reading kernel events from netlink or kernel hotplug helper. What is this new, third plug mechanism? I think that is the piece I am missing to understand why fifo manager approach would be superior. (feel free to point me to a link in the mailing list archive in case you already wrote about it and it drowned in the amount of words) -nc ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
hotplug and modalias handling (WAS: Re: RFD: Rework/extending functionality of mdev)
On Fri, 13 Mar 2015 13:12:56 -0600 James Bowlin bit...@gmail.com wrote: TL;DR: the current busybox (or my code) seems to be broken on one or two systems out of thousands; crucial modules don't get loaded using hotplug + coldplug. Please give me a runtime option so I can give my users a boot-time option to try the new approach. Do you use mdev as kernel helper? I have a few users (out of many thousands) who have trouble loading modules they need using the conventional busybox tools. I enable hotplugging and then coldplug, loading everything from: /sys/devices -name modalias -exec cat '{}' + 2/dev/null I've also used: find /sys/devices -name uevent -exec sed -n \ 's/^MODALIAS=//p' '{}' + 2/dev/null \ But after filtering through sort -u the outputs of the find commands are identical. So I have a failsafe fallback that loads all modules from: /lib/modules/$(uname -r) Alpine linux initramfs currently does: find /sys -name modalias | xargs sort -u | xargs modprobe -a using busybox modprobe. Since loading the modules might trigger new MODALIAS events you need your hotplugger to handle those. You can use this line in mdev.conf: $MODALIAS=.*root:root 0660@modprobe -b $MODALIAS However, what I want to do is fix the hotplug handler to be so fast that I can scan /sys for uevent and trigger the hotplug event. This means that the hotplug event handler needs to be able to load kernel modules without (too much) forking modprobe. I am considering libkmod as an alternative but I would prefer that it could be solved with busybox only, however, when looking at the busybox modalias code, it looks like it will require intrusive changes to make it handle modaliases from a stream. -nc ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
Le 16/03/2015 00:58, Harald Becker a écrit : We were looking at alternative solutions, so even one more: 3) netlink reader the Unix way Why let our netlink reader bother about where he sends the event messages. Just let him do his netlink receiption job and forward the messages to stdout. netlink reader: set stdout to non blocking I/O establish netlink socket wait for event messages gather event information write message to stdout hotplug startup code: create a private pipe spawn netlink reader, redirect stdout to write end of pipe spawn fifosvd - xdev parser, redirect stdin from read end of pipe close both pipe ends (write end open in netlink, read in fifosvd) The general scheme makes sense to me, but I would chane two details: 1) why not let fifosvd act as the startup code? It is anyway the supervisor of processes at both ends of the pipe and in charge of re-spawning them in case they die. Netlink receiver should be restarted immediately to not miss events, while event handler should be restarted on event (see comment below). 2) fifosvd would never close any end of the pipe because it could need them to re-spawn any of the other processes. Like this, no need for a named pipe as long as fifosvd lives. And I have a suggestion for simplicity: Let be the timeout/no-timeout feature be a parameter only for the event handler; it does not need to change the behaviour of fifosvd. I think it is fine to restart event handler on-event even when it dies unexpectedly. Didier This way we can let the starting process decide which type of pipe we use: private pipe for netlink, and named pipe for hotplug helper. I think this is not far away from Laurent's (or Natanael's) solution, at the only cost of a small long lived helper process, managing the on demand handler startup and checking for failures. Small general purpose daemon in the sense of supervisor daemons (e.g. tcpsvd), with generally reusable function for other purposes. ... better? Ok, but brings me to the message format in the pipe, I highly think, we should use a textual format, but do required checks for control chars and do some (shell compatible) quoting. This would allow to do: netlink reader /dev/ttyX (to display all device plug events on a console) netlink reader /tmp/uevent.log (append all event message to log file) ... and all such things. I know, the parser needs to do some checking and unquoting, but we have a single reader and it doesn't matter how much data it reads from the pipe in a single hunck, as long as the writers assure, they are going to write a single message with one write (atomicity). The parser assumes reading text from stdin, with required checking and unquoting. This way we get maximum compatibility and may easily replace every part with some kind of script. -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On Sun, Mar 15, 2015 at 08:06 PM, Laurent Bercot said: kernel guaranties not only atomicity for write operations, but also for read operations (not in POSIX, AFAIK). Second sentence of http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html : ^ For goodness sake. This appears to be an argument merely for the sake of having an argument. If, say, Linux has documentation somewhere explaining what it does with multiple readers on a pipe, [...] A simple Internet search brings up the Linux fifo man page: http://man7.org/linux/man-pages/man7/fifo.7.html A FIFO special file (a named pipe) is similar to a pipe, except that it is accessed as part of the filesystem. It can be opened by multiple processes for reading or writing. [] and committing to NOT changing that behaviour EVER, IMO this is a ridiculous demand that can't ever be met by any software. For my users the current busybox mdev hotplugging is not 100% reliable (more like 99+% reliable) which is a big pain. I'd love to see other busybox hotplug solutions that are selectable at runtime. You both have a lot of value to contribute. If both/all solutions can be (via compile-time options) available at runtime and one is clearly superior to the others then that will be quickly figured out. It is very kind to try to help someone else from wasting time on a technically inferior solution but at some point it is better for everyone to just let them go ahead and use their time as they see fit. Peace, James ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 14.03.2015 03:40, Laurent Bercot wrote: What would you do if your kid wanted to drive a car but said he didn't like steering wheels, would you build him a car with a joystick ? ... [base car with wheel steering module replaceable by a joystick module] ... ... and the next one coming around, with an automatic steering module, may replace the wheel steering or joystick module, plug in his module, and also take advantage of your base car ... -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 14.03.2015 03:40, Laurent Bercot wrote: - for reading: having several readers on the same pipe is the land of undefined behaviour. You definitely don't want that. ... just for the curiosity: On most systems it is perfectly possible to have multiple readers on a pipe, when all readers and writers be so polite to use the same message size (= PIPE_BUF). On most (but not all Unix systems) the kernel guaranties not only atomicity for write operations, but also for read operations (not in POSIX, AFAIK). With multiple readers you will get a load balancing. The first available reader get the next message from the pipe and has to handle this message. You can't predict, which process will receive a specific message, as this every process has to handle all types of incoming messages. This is usually true, when all readers share the same program. With a small helper you can even fire up a new reader process when there is more data in the pipe, then sleep some time to let the new process pick up a message from the pipe, and then check the pipe again for more data, firing up the next reader (up to an upper limit of reader processes). Reader processes just die, when no more data is available in the pipe. This belongs to pipes, and it does not matter if they are private or named (fifo). -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 15/03/2015 19:39, Harald Becker wrote: On most systems it is perfectly possible to have multiple readers on a pipe, when all readers and writers be so polite to use the same message size (= PIPE_BUF). On most (but not all Unix systems) the kernel guaranties not only atomicity for write operations, but also for read operations (not in POSIX, AFAIK). Second sentence of http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html : The behavior of multiple concurrent reads on the same pipe, FIFO, or terminal device is unspecified. What most systems do in practice is irrelevant. There is no guarantee at all on what a system will do when you have multiple readers on the same pipe; so it's a bad idea, and I don't see that there's any room for discussion here. If, say, Linux has documentation somewhere explaining what it does with multiple readers on a pipe, and committing to NOT changing that behaviour EVER, then it might be reasonable for Linux-specific software to rely on it. I'm not aware of such a piece of documentation though. -- Laurent ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 15.03.2015 20:06, Laurent Bercot wrote: What most systems do in practice is irrelevant. There is no guarantee at all on what a system will do when you have multiple readers on the same pipe; so it's a bad idea, and I don't see that there's any room for discussion here. My lead in was: just for curiosity, and that's it, it works on many systems. ... but I never proposed, doing something like that. It's what my lead in says: curiosity. -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 15/03/2015 20:41, James Bowlin wrote: http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html : ^ For goodness sake. This appears to be an argument merely for the sake of having an argument. This is POSIX.1-2008, the very specification that Linux, and other operating systems, are supposed to implement. It is the authoritative reference to follow whenever you're designing Unix software. I don't understand what your objection is. A FIFO special file (a named pipe) is similar to a pipe, except that it is accessed as part of the filesystem. It can be opened by multiple processes for reading or writing. Yes, I know the Linux man pages, and of course multiple processes are allowed to open a pipe for reading. But there is nothing in that page that documents what Linux does when multiple processes actually attempt to read on the same pipe. [] and committing to NOT changing that behaviour EVER, IMO this is a ridiculous demand that can't ever be met by any software. This is called specification and normalization, i.e. what standards are for. Sure, standards change and evolve, and that's a good thing; my point is when something is explicitly non-standardized, it is not a good idea to do that thing and expect a fixed behaviour. There really is no room for disagreement here. For my users the current busybox mdev hotplugging is not 100% reliable (more like 99+% reliable) which is a big pain. I'd love to see other busybox hotplug solutions that are selectable at runtime. So would I, and the solution you're looking for is called netlink + mdev -i, which is all that remains to be implemented. I would really like to cut on the bikeshedding and see some real work done now; if nothing has appeared when I get some time, I'll do it myself - which so far seems the only way to get things done, and which will help you much more than buggy solutions in search of a problem. It is very kind to try to help someone else from wasting time on a technically inferior solution but at some point it is better for everyone to just let them go ahead and use their time as they see fit. Multiple readers on a pipe is not technically inferior, it is technically *invalid*. I'm not preventing anyone from coding anything, but I will fight inclusion of buggy code into busybox, which is a major disservice to do to you and your users. -- Laurent ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 15.03.2015 20:06, Laurent Bercot wrote: The behavior of multiple concurrent reads on the same pipe, FIFO, or terminal device is unspecified. That is, you can't predict, which process will get the data, but each single read operation on a pipe (private or named), is done atomic. Either it is done complete and read the requested number of bytes, or it is not done at all. It won't read half of the data, then let some data pass to a different process, then continue with the read in the first process (or let that one do short read, when there is enough data). I don't want to introduce this or use it, but please stop and think about: When each writer and each reader agree at the size of messages written / read from the pipe, you can have multiple writers *and* multiple readers on the same pipe. Due to atomicity of the write and read operations. ... and I know, it's not in POSIX / OpenGroup ... it's just working practice ... try it and you will see, it works. ... for curiosity :) (And don't fear, I won't do this) ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 15/03/2015 23:44, Harald Becker wrote: There are to many stupid programmers out there, who would try to add something like that into system management related programs. Couldn't go worser. Even if it works, at the first glance, it is error prone, and the next who change message size of one process, will break the chain, and possibly smash the system (as running with root privileges). Laurent just overlooked my lead in just for curiosity. And (again) that's it: curiosity. ... at least until reactions have bean standardized, clearly documented (not same as former), and guarantied. Yes. Thank you for acknowledging it and for alleviating my fear. -- Laurent ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 15/03/2015 21:54, Harald Becker wrote: My lead in was: just for curiosity, and that's it, it works on many systems. ... but I never proposed, doing something like that. It's what my lead in says: curiosity. Okay, fair enough. Please let's not do it then. :) -- Laurent ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
Hi James! 1) Why argue over something that has already been admitted? It does not bolster your argument and it does not put in you a good light. Keep cool, I know the fears of Laurent. There are to many stupid programmers out there, who would try to add something like that into system management related programs. Couldn't go worser. Even if it works, at the first glance, it is error prone, and the next who change message size of one process, will break the chain, and possibly smash the system (as running with root privileges). Laurent just overlooked my lead in just for curiosity. And (again) that's it: curiosity. ... at least until reactions have bean standardized, clearly documented (not same as former), and guarantied. -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
So, as I'm not in write-only mode, here some possible alternatives, we could do (may be it shows better, how and why I got to my approach): 1) netlink with a private pipe to long lived handler: establish netlink socket spawn a pipe to handler process write a netlink, no timeout message to pipe wait for event messages gather event information write message to pipe The initial pipe message let the parser / handler know that we are at netlink operation and disable timeout functionality, resulting in both processes being long lived. This won't harm the system much, as memory of sleeping processes is usually swapped out, but still resources lie around unused. @Laurent: You know the race conditions why the handler process needs to be long lived here, or we need complex pipe management with re-spawning handler, and all that stuff. You told about them. This would indeed be the simplest solution when splitting of netlink reader and handler. Other mechanisms may still create a named pipe and use the same handler for it's purpose. With the cave-eat of two long lived processes, where I call one big. So, look forward for second alternative ... 2) netlink with a private pipe but on demand start of handler (avoiding the race): create a pipe and hold both ends open (but never read) establish netlink socket wait for event message gather event information if no handler process running spawn a new handler process, redirecting stdin from read end of pipe write message to pipe with a SIGCHLD handling of: get status of process do failure management check for data still pending in pipe re-spawn a handler process, redirecting stdin from read end of pipe The netlink reader is a long lived process, the handler is started on demand when required and may die after some timeout. Races won't happen this way, as the pipe does not vanish and data written into the pipe during exit of an old handler, does not get lost (next handler will get the message). ... better? This is, was I want to do, with an additional choice of more clarity: Let the netlink reader do it's job, and split of the pipe management and handler start into a separate thread, but otherwise exactly the same operation. With *no* extra cost, the pipe management and the handler startup may then be used for other mechanism(s). ... still afraid about using a named pipe? You still would prefer a private pipe for netlink? ... ok look at the next alternative (and on this one I came taking your fears into account). 3) netlink spawning external supervisor for on demand handler startup netlink reader: establish netlink socket create a pipe, save write end for writing to pipe spawn fifosvd - xdev parser, redirecting stdin from read end of pipe close read end of pipe wait for event messages gather event information write message to pipe fifosvd: save and hold read end of pipe open (but never read) wait until data arrive in pipe (poll for read) spawn handler process, handing over the pipe read end to stdin wait for exit of process failure management A novice may think this way we added another process in the data flow, but no, the data flow is still the same: netlink - pipe - handler. The extra process is a small helper, containing the code for the on demand start of the handler, and the failure management, but will never get in contact with the data passed through the pipe. This approach allows simple reusing of code for other mechanism(s), and fifosvd may be of general usage: when argument is a single dash (-), it uses the pipe from stdin else it creates and opens a named pipe. May also be used for on demand start of other jobs: process producing sometimes data | script to process the data may be changed to: process producing data | fifosvd - script to process data will script start on demand when data arrives in the pipe, and when script dies, restart as soon as more data is in the pipe. This is extra benefit from my approach, with no extra cost. I hope this helps to solve some fears. -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On Sun, Mar 15, 2015 at 10:10 PM, Laurent Bercot said: This is POSIX.1-2008, the very specification that Linux, and other operating systems, are supposed to implement. It is the authoritative reference to follow whenever you're designing Unix software. I don't understand what your objection is. I tried to make my complaint clear by highlighting two portions of text, one by Harald and one by you. I apologize that my post was unclear to you. You omitted one part of text I had highlighted so I repeat it here: On Sun, Mar 15, 2015 at 01:41 PM, James Bowlin said: kernel guaranties not only atomicity for write operations, but also for read operations (not in POSIX, AFAIK). The post you were replying to already admitted that multiple fifo readers was not POSIX compliant. 1) Why argue over something that has already been admitted? It does not bolster your argument and it does not put in you a good light. 2) While violating POSIX is usually not a good idea, it is well known that POSIX is woefully incomplete and there are long-standing extensions to POSIX on all real-world system including busybox. I propose an extension to Godwin's law that if someone objects to something because it relies on a long-standing extension to a POSIX standard then that person automatically loses the argument. I think your suggestions have been very valuable and your proposals may well represent a superior solution technically. I am a little frustrated by the silly arguments because I am so looking forward to seeing the fruition of your ideas. Peace, James ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 14.03.2015 03:40, Laurent Bercot wrote: Hm, after checking, you're right: the guarantee of atomicity applies with nonblocking IO too, i.e. there are no short writes. Which is a good thing, as long as you know that no message will exceed PIPE_BUF - and that is, for now, the case with uevents, but I still don't like to rely on it. Named pipes are a proven IPC concept, not only in the Unix world. They are pipes and behave exactly as them, including non blocking I/O, programming the poll loop, and failure handling. There is only one difference, the method how to get access to the pipe file descriptors (either calling pipe or open). I call pipe an anonymous pipe. And an anonymous pipe, created by the netlink listener when it forks the event handler, is clearly the right solution, because it is private to those two processes. With a fifo, aka named pipe, any process with the appropriate file system access may connect to the pipe, and that is a problem: Right, any process with root access may write to this pipe, but don't you think such processes have the ability to do other need things, like changing the device node entries in the device file system directly? May processes with root access produce confusion on the pipe? Yes, but aren't such processes be able to produce any kind of confusion they like? We could have (at some slight extra cost): - create the fifo with devparser:pipegroup 0620 - run hotplug helper (if used) suid:sgid to hotplug:pipegroup (or drop privileges to that) - drop netlink reader after socket creation to same user:group - run the fifo supervisor as devparser:parsergroup - but then we need to run the parser as suid root Needs to access device file system and do some operation which require root (as far as I remember). Any suggestion how to avoid that suid root? - for writing: why would you need another process to write into the pipe ? You have *one* authoritative source of information, which is the netlink listener, any other source is noise, or worse, malevolent. You stuck on netlink usage and oversee you are forcing others to do it your way. No doubt about the reasons for using netlink, but why forcing those who dislike? This forcing won't be different then forcing others to rely on e.g. systemd? Isn't it? (provocation, don't expected to be answered) Where as I'm trying to give the user (or say system maintainer) the ability to chose the mechanism he likes, and even with the chance to flip the mechanism, by just modifying one or two parameters or commands. Flipping the mechanism is even possible in a running system without disturbance, and without changing configuration. So why is this approach worser than forcing others to do things in a specific way? Except those known arguments why netlink is the better solution, where we absolutely agree. - for reading: having several readers on the same pipe is the land of undefined behaviour. You definitely don't want that. Is here anyone trying to have more than one reader on the pipe? The only one reader of the pipe is the parser, and right as we are using fifos the parser shouldn't bet on incoming message format and content. It shall do sanity checks on those before usage (and here we hit the point, where I expect getting some overhead, not much due to other changes). Isn't that good practice to do this for other pipes too (even if a bit more for paranoia)? But all with the benefit of avoiding re-parsing the conf for every incoming event, and expected over all speed improvement. Not to talk about the possibility to chose/flip the mechanism as the user likes. This even includes extra possibilities for e.g. debugging and watching purposes. With a simple redirection of the pipe you may add event logging functionality and/or live display of all event messages (possibly filtered by a formating script / program). All without extra cost / impact for normal usage, and without creating special debug versions of the event handler system. I'm just trying to make it modular, not monolithic. - generally speaking, fifos are tricky beasts, with weird and unclear semantics around what happens when the last writer closes, different behaviours wrt kernel and even *kernel version*, and more than their fair share of bugs. Programming a poll() loop around a fifo is a lot more complicated, counter-intuitive, and brittle, than it should be (whereas anonymous pipes are easy as pie. Mmm... pie.) See my statement about fifos above, I don't know what you fear about fifos, but there usage and functionality is more proven in the Unix world, as you expect. Sure you need to watch your steps, but this shall also be done when using pipes (even if only for paranoia, e.g. checking incoming data before usage and not blind reliance). And may be there are internal differences on pipe / fifo handling in the kernels, but likely they are internal and don't change the expected usage
RE: RFD: Rework/extending functionality of mdev
Stream-writes [pipe] are not atomic, and your message can theoretically get cut in half and interleaved with another process writing the same fifo. Any pipe, whether named or not, IS atomic so long as the datagrams in question are smaller than PIPE_BUF in size. This has been true since Day 1, in every Unix worthy of the name. You have to be careful on the reads, though: you need to embed the size of the datagram into itself so that you can be sure you don't get them packed together. If the datagrams are of fixed size, then you don't even need this. Most of the pipes I use this way have a datagram whose first field is the size. Atomic write(2) of each datagram into the pipe. The reader does a read(2) of the size field, followed by a read sized to get the rest of the datagram. No muss, no fuss, and pretty darned fast, too. And it works _everywhere_, on every Unix/Linux/whateverix version known to Man. -- Jim ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
TL;DR: the current busybox (or my code) seems to be broken on one or two systems out of thousands; crucial modules don't get loaded using hotplug + coldplug. Please give me a runtime option so I can give my users a boot-time option to try the new approach. On Fri, Mar 13, 2015 at 02:04 PM, Guillermo Rodriguez Garcia said: Perhaps my message is not getting across. What I am saying is that I am not sure your suggestion is actually something others would find useful. Other than the fact that you like it yourself I don't see a lot of enthusiasm around about it. I've been following this thread on and off and I feel *very* enthusiastic about Harald Becker's approach. I kept silent because there was already so much posting and I figured everyone would finally get on board with Harald's suggestions. I admit I build my own busybox but I would *hate* to have to make this choice at build-time. I want to give my users a boot-time option (failsafe or something like that) to go back to the older, slower way. Or maybe make the older way the default and give them an option to use the new approach. Making it a build-time option is extremely hubristic; it assumes both the current code and the new code are perfect. I have a few users (out of many thousands) who have trouble loading modules they need using the conventional busybox tools. I enable hotplugging and then coldplug, loading everything from: /sys/devices -name modalias -exec cat '{}' + 2/dev/null I've also used: find /sys/devices -name uevent -exec sed -n \ 's/^MODALIAS=//p' '{}' + 2/dev/null \ But after filtering through sort -u the outputs of the find commands are identical. So I have a failsafe fallback that loads all modules from: /lib/modules/$(uname -r) The alias for the needed module shows up in those lists but they don't always get loaded. Oddly enough the crucial modules does *not* show up when I pipe that output through: | xargs modprobe -a -D But it shows up when I use the Debian modprobe. The crucial module does get loaded on my system when I use: | xargs modprobe -a This is a strange bug in modprobe -a -D but it does not explain the bug that causes failure to load certain modules via hotplug + coldplug. Maybe the bug is in my code; maybe the bug is in busybox. I don't know for sure. My guess is it is a glitch in the hotplugging. I have not heard back from the user who reported the problem. The next test would be to keep doing the cold-plugging and see if that fixes the problem. Last year I discovered and reported a bug in the small modprobe when it did not load certain modules during a coldplug. The bug was fixed but I have since moved to the large modprobe. I'm hoping that a more streamlined and orderly hotplug solution in busybox will fix the problem above so I don't have to ever resort to loading all modules. If Harald's solution is a runtime option then I could make it a boot-time option and test it on thousands of machines. ISTM this is the only sane approach. I also liked this suggestion from Natanael: On Fri, Mar 13, 2015 at 10:33 AM, Natanael Copa said: While solving this, I would also like to find a way to load the MODALIAS without forking. But I don't think this is the answer: One option is to add modprobe -i which reads modaliases from stdin with a timeout. Maybe that is a lower hanging fruit than mdev -i? Since ISTM that can be handled with xargs (does that fork a lot?). ISTM there is a lot of forking in the find commands as illustrated above. I think we would need to use: modprobe --find-modalias /sys/devices where the user gets to define which directory is searched. There could also be a: modprobe --find-uevent /sys/devices where the grep/sed is built-in as well. OTOH, perhaps these would be a nightmare code-wise. I'd sure like to know if this approach is significantly faster or not. BTW: I try to time most things in my initrd init script to keep an eye on what slows things down so we can stay as speedy as possible. I use: cut -d -f22 /proc/self/stat to get the time in hundredths of a second since the kernel booted. But this itself and the associated arithmetic is not free time-wise. So I disable most of the timing when I'm not in debug mode. If there is a better/faster way to get the time to hundredths of a second or better, PLMK. A shell built-in would make me very happy. Perhaps this seems way OT but ISTM that precise timing is essential for comparing how fast things are when the overall time is only a second or two. If there is a better way to get the time, PLMK; otherwise I suggest people use the cut command above for timing how long things take. Peace, James ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 13.03.2015 16:53, Natanael Copa wrote: I have the feeling (without digging into the busybox code) that making mdev suitable for a longlived daemon will not be that easy. I suspect there are lots of error handling that will just exit. A long lived daemon would need log and continue instead. The major mdev part, will not be converted in a long lived daemon. That is more or less code in a process doing a job like cat, but still you are right. It will take some time and has to be done carefully. No doubt. One of the reasons to have that fifo supervisor is the failure management. Even if the parser / handler process dies, this is catched and can fire up a failure script (something we do not have now). The kernel hotplug handler stays as a normal process at all, does only need to write the gathered info to the pipe instead of a function to handle the event operation, so it is mainly replacing the handler function with a write to the named pipe, and on the other sid the hotplug gathering part is replaced by a read from stdin (pipe) with timeout. So major work will go to the code resorting parser / device operation handler, but I expect the need of doing a parser rewrite (which is straight forward for that simple syntax). Don't misunderstand, it will be an expensive piece of work, but compared to finding a specific bug in a 3 line of a program from somebody else, the mdev code is simple ... and it's not my first Busybox hacking, it is only the first time do that as public discussion in this channel / list. I started creating specialized BB versions around 1995, so some experience now. Did you note that pseudo code for the fifo supervisor? May be it hopped to the other thread. That is standard code, more or less, comparable to e.g. tcpsvd ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
El viernes, 13 de marzo de 2015, James Bowlin bit...@gmail.com escribió: On Fri, Mar 13, 2015 at 08:33 PM, Guillermo Rodriguez Garcia said: You are talking about a possible bug in the current implementation. In my opinion this is completely independent from whether a redesign/architecture change is required or wanted. ISTM you are assuming the redesign will not fix the bug. No, I am not asumming that. I am saying: if there is a bug then let's fix it. The discussion on whether mdev needs a redesign is independent of that. In other words I am saying that there is a bug is not in itself a reason to do a redesign. If the different versions are runtime options then it will be easy to see if the new versions fix the bug or not. Let's make it easy for me to see if your assumption is correct or not. There was no such assumption. [...] the only sane approach is to let the choice be at runtime so there is a fallback in case there is a bug that only shows up on specific hardware. This approach seems so obvious to me that I can't imagine it is controversial. No controversy on my side. In fact I am not advocating compile time options over runtime options, nor the opposite. I was just trying to understand why Michael's proposal was not good enough for Harald. Guillermo -- Guillermo Rodriguez Garcia guille.rodrig...@gmail.com ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
Hi James, El viernes, 13 de marzo de 2015, James Bowlin bit...@gmail.com escribió: TL;DR: the current busybox (or my code) seems to be broken on one or two systems out of thousands; crucial modules don't get loaded using hotplug + coldplug. Please give me a runtime option so I can give my users a boot-time option to try the new approach. You are talking about a possible bug in the current implementation. In my opinion this is completely independent from whether a redesign/architecture change is required or wanted. [...] BTW: I try to time most things in my initrd init script to keep an eye on what slows things down so we can stay as speedy as possible. I use: cut -d -f22 /proc/self/stat to get the time in hundredths of a second since the kernel booted. But this itself and the associated arithmetic is not free time-wise. So I disable most of the timing when I'm not in debug mode. If there is a better/faster way to get the time to hundredths of a second or better, PLMK. On embedded targets I normally use grabserial for boot timing ( http://elinux.org/Grabserial) Just in case it's useful. Guillermo -- Guillermo Rodriguez Garcia guille.rodrig...@gmail.com ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 13/03/2015 21:32, Michael Conrad wrote: I stand corrected. I thought there would be a partial write if the pipe was mostly full, but indeed, it blocks. Except you have to make the writes blocking, which severely limits what the writers can do - no asynchronous event loop. For a simple cat-equivalent between the netlink and a fifo, it's enough, but it's about all it's good for. And such a cat-equivalent is still useless. It's still a very bad idea to allow writes from different sources into a single fifo when there's only one authoritative source of data, in this case the netlink. If a process wants to read uevents from a pipe, it can simply read from an anonymous pipe, being spawned by the netlink listener. That's what my s6-uevent-listener and Natanael's nldev do, and I agree with you: there's simply no need to introduce fifos into the picture. -- Laurent ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On Fri, Mar 13, 2015 at 08:33 PM, Guillermo Rodriguez Garcia said: You are talking about a possible bug in the current implementation. In my opinion this is completely independent from whether a redesign/architecture change is required or wanted. ISTM you are assuming the redesign will not fix the bug. If the different versions are runtime options then it will be easy to see if the new versions fix the bug or not. Let's make it easy for me to see if your assumption is correct or not. It is much easier on my end to tell a user to use a boot parameter than it is for me to build a new busybox for them and then have them install a new initrd. Even if the new designs do not fix the specific bug I mentioned, since the redesigns are dealing with hardware related issues (loading hardware specific modules) we really want to test the new designs on a large variety of hardware. ISTM the only sane approach is to let the choice be at runtime so there is a fallback in case there is a bug that only shows up on specific hardware. This approach seems so obvious to me that I can't imagine it is controversial. My access to thousands of machines has allowed me to catch and report a nasty bug in modprobe last year. I've also detected another bug related to loading hardware specific modules but this 2nd bug has been harder to nail down. My point is that if we make the new designs a runtime option then it makes it easier and safer for me to test them on thousands of different machines. There may be others who are in a similar situation who also have busybox code that runs on many different machines. Peace, James Science is the belief in the ignorance of experts. -- Richard Feynman ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 3/13/2015 11:21 AM, Harald Becker wrote: On 13.03.2015 12:41, Michael Conrad wrote: Stream-writes are not atomic, and your message can theoretically get cut in half and interleaved with another process writing the same fifo. (in practice, this is unlikely, but still an invalid design) ---snip--- O_NONBLOCK disabled, n = PIPE_BUF All n bytes are written atomically; write(2) may block if there is not room for n bytes to be written immediately ---snip--- I stand corrected. I thought there would be a partial write if the pipe was mostly full, but indeed, it blocks. If someone really wants a netlink solution they will not be happy with a fifo approximation of one. You missed the fact, my approach allows for free selection of the mechanism. C hosing netlink means using netlink, as it should be. The event listener part is as small as possible and write to the pipe, which fire up a parser / handler to consume the event messages. Are you suggesting even the netlink mode will have a process reading the netlink socket and writing the fifo, so another process and can process the fifo? The netlink messages are already a simple protocol, just use it as-is. Pass the On 3/13/2015 12:14 PM, Harald Becker wrote: The new code would not be run like a hotplug helper, it would be run as a daemon, probably from a supervisor. But the old code is still there and can still be run as a hotplug helper. The new code behaves exactly as the old code. When used as a hotplug helper, it suffers from parsing conf for each event. My approach is a splitting of the old mdev into two active threads, which avoid those problems even for those who like to stay at kernel hotplug. Then it sounds like indeed, you are introducing new configuration steps for the old-style hotplug helper? i.e. where does the fifo live? who owns it? what security implications does this have? Who starts the single back-end mdev processor? If started from the hotplug-helper, who ensures that only one gets started? If people have existing systems using hotplug-helper mdev, you can't just change the implementation on them in a way that requires extra configuration. Everyone who has commented on this thread so far agrees with that. -Mike ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 13.03.2015 23:33, Laurent Bercot wrote: On 11/03/2015 08:45, Natanael Copa wrote: With that in mind, wouldn't it be better to have the timer code in the handler/parser? When there comes no new messages from pipe within a given time, the handler/parser just exists. I've thought about that a bit, to see if there really was value in making the handler exit after a timeout. And it's a lot more complex than it appears, because you then get respawner design issues, the same that appear when you write a supervisor. Which issues? What if the handler dies too fast and there are still events in the queue ? Should you respawn the handler instantly ? spawning the handler is the job of the named pipe supervisor. At first it checks the exit code of the dieing handler and spans a failure script if not sucessfull. Then waits until data in pipe arrive (or is still there = poll for reading), and finally span a new handler The trick on this is, to hold the pipe open for reading and writing in the supervisor. This way you avoid race conditions from recreating new pipes, and catch even situation when an event arrive at the moment the handler got a timeout and is dieing. Otherwise, does the supervisor not touch the content transfered through the pipe. That's exactly the kind of load you're trying to avoid by having a (supposedly) long-lived handler. Should you wait for a bit before respawning the handler ? How long are you willing to delay your events ? A bit more of checking is planned already, Currently I have an failure counter and detect when parser successively dies unsuccessfully, but may be we can add in an respawn counter, who triggers a delay (maybe increasing) on to many respawns without processing all the pipe data, but when handler exit and pipe is empty (poll), then respawn counter is reset. So you get two or three fast respawns after handler dies (when timeout on poll) and more data in pipe, then something seams to be wrong, so start adding increasing delays before respawning. The normal case is, when handler exit due to timeout, the pipe is empty, so we can reset the counter and have no need to delay process respawn, as soon as new data arrive in pipe. And when the respawn counter goes above some limit or the handler dies unsuccessful, a failure script is spawned first, with arguments programname, exit code or signal, failure count It is necessary to ask these questions, and have the mechanisms in place to handle that case - but the case should actually never happen: it is an admin error to have the event handler die to fast. admins don't make errors! ;) So it's code that should be there but should never be used; and it's already there in the supervisor program that should monitor your netlink listener. Ok, you expect the netlink listener be watched by a supervisor daemon? Fine so the fifo supervisor should also be watched, as it got forked from same process as the netlink reader ... that means when we detect handler failures, we can just die and let the outer supervisor do the job :) When that happens the system is usually on it's way to hell ... and even if that happens, what does it mean to the system? ... hotplug events are no longer handled, we loose them and may have to re-trigger the plug events, as soon as hotplug events are processed again (however this is achieved) ... and in the worst case you are back at semiautomatic device management, calling mdev -s to update device file system. ... but consider conf file got vandalized, or the device file system ... how to suffer from this? ... do you expect to handle those? ... wouldn't it be better to reboot, after counting the failure in some persistent storage? So my conclusion is that it's just not worth it to allow the event handler to die. s6-uevent-listener considers that its child should be long-lived; That's the problem of spawning the handler in your netlink reader. The netlink reader has to open the pipe for writing in non blocking mode, then write a complete message as a single chunk, failure check the write (you always need and handle it), done. If open/write to pipe is not possible, the device plug system has gone and need restart, so let the netlink listener die (unusual condition). One critical condition should be watched and handled, when pipe is full and write (poll for write) has timeout, what than? ... but this is not different then in your solution. -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 14/03/2015 01:20, Harald Becker wrote: I'm using none blocking I/O and expect handling of failure situations, e.g writing into pipe fails with EAGAIN: poll wait until write possible with timeout. then redo write Hm, after checking, you're right: the guarantee of atomicity applies with nonblocking IO too, i.e. there are no short writes. Which is a good thing, as long as you know that no message will exceed PIPE_BUF - and that is, for now, the case with uevents, but I still don't like to rely on it. Laurent, you still stuck on netlink! Using named pipe is requirement for the hotplug helper stuff, how else should they get access to the pipe, when not using a named pipe? And what is the difference between fifos and pipes? I call pipe an anonymous pipe. And an anonymous pipe, created by the netlink listener when it forks the event handler, is clearly the right solution, because it is private to those two processes. With a fifo, aka named pipe, any process with the appropriate file system access may connect to the pipe, and that is a problem: - for writing: why would you need another process to write into the pipe ? You have *one* authoritative source of information, which is the netlink listener, any other source is noise, or worse, malevolent. - for reading: having several readers on the same pipe is the land of undefined behaviour. You definitely don't want that. - generally speaking, fifos are tricky beasts, with weird and unclear semantics around what happens when the last writer closes, different behaviours wrt kernel and even *kernel version*, and more than their fair share of bugs. Programming a poll() loop around a fifo is a lot more complicated, counter-intuitive, and brittle, than it should be (whereas anonymous pipes are easy as pie. Mmm... pie.) I'm talking about the netlink because it's a natural source of streamed uevents, which is exactly what you want to give to a long-lived uevent handler such as mdev -i. I really have no idea why you're fixated on that fifo idea when all the pieces that you need already exist. A netlink listener forking a long-lived event handler and transmitting events to it via an anonymous pipe is the simplest design that will accomplish what you want. Pardon my bluntness, but I think you've been in write-only mode since the beginning of the discussion, and it is irritating. You are certainly very experienced, but you are not the only one who knows how Unix works; you are convinced we are not understanding your great plan, or that we want to prevent you from organizing your computers the way you want, or that you need to explain to us how fifos and super-servers work; none of that is true. This is the busybox mailing-list, we are not as clueless or malevolent as some people you may have had the displeasure of working with; and would you believe, some of us know how to design Unix software that does not need a power plant to run. Please take a look at my s6-uevent-listener/s6-uevent-spawner, or at Natanael's nldev/nldev-handler. The long-lived uevent handler idea is a *solved problem*. Take the code, busyboxify it if you want, make it into a single binary, whatever; all the tools you need are there. You have: - a short-lived uevent handler, registrable as a hotplug helper: mdev - a long-lived uevent handler: mdev -i (it's not there yet, but AIUI that's what you wanted to work on) - a netlink listener: s6-uevent-listener or nldev - a coldplug trigger: mdev -s - just in case someone would want that (and until mdev -i actually exists, we do want that), you even have long-lived programs suitable to be spawned by a netlink listener, that themselves spawn a helper such as mdev for every event: s6-uevent-spawner or nldev-handler. The advantage of this construct over simply registering mdev in /proc/sys/kernel/hotplug is that serializes events. It does not get any simpler, or more modular, than this set of tools. Your original plan was to: - write mdev -i: I think it's a good idea. - modify mdev -s to add more functionality than just triggering a coldplug: I don't think it's good design but I don't care as long as I can configure it out. Other people have also answered. The answers may not have been the ones you were looking for, but you wanted feedback, you got feedback. I'm not interested in providing 15 APIs to do the same thing. Users don't like the netlink ? Tough, if they want serialized uevents. What would you do if your kid wanted to drive a car but said he didn't like steering wheels, would you build him a car with a joystick ? I provided you with clear designs and working code. So did other people. (You said code is premature at this point. I'm sorry, but no, code is not premature when the problem is solved, and if you're not convinced, please simply study the code, which is extremely short.) Now it's all up to you. I would like to see a mdev -i, can you work on it ? If you prefer to keep beating around the bush and smoking crack
Re: RFD: Rework/extending functionality of mdev
On 13.03.2015 21:32, Michael Conrad wrote: Are you suggesting even the netlink mode will have a process reading the netlink socket and writing the fifo, so another process and can process the fifo? The netlink messages are already a simple protocol, just use it as-is. Pass the You got the function of the fifo manager (or supervisor) wrong. This little process does never read or touch the fifo data, it's purpose is to fire up the parser (pipe consumer) when any gathering part as written something into the pipe and there is no running parser process. In addition this supervisor may spawn a failure_script, when the parser abort unexpectedly. Have you ever used tcpsvd ? This piece open a network socket, then accept incoming connections and pass the socket to a spawned service process. The fifo manager does the same for the named pipe. The data flow is: netlink daemon - pipe - parser or hotplug helper - pipe - parser The new code behaves exactly as the old code. When used as a hotplug helper, it suffers from parsing conf for each event. My approach is a splitting of the old mdev into two active threads, which avoid those problems even for those who like to stay at kernel hotplug. Then it sounds like indeed, you are introducing new configuration steps for the old-style hotplug helper? ? i.e. where does the fifo live? at a simple default: /dev/.xdev-pipe, because any reading of such parameters in the hotplug helper would slow down the operation. Remember hotplug helpers a spawned in parallel. who owns it? The only user who's allowed to do netlink operation, load modules, create any device nodes, etc. - root:root what security implications does this have? Mode of the fifo will be 0600 = rw--- Who starts the single back-end mdev processor? This is the job of the fifo manager (or named pipe supervisor). The processor, as you call it, is started on demand, when data is written to the fifo, the processor has to die when when idle for some time. If started from the hotplug-helper, who ensures that only one gets started? ? started from the hotplug helper? the helper won't ever start anything, just: hotplug helper: gather event information sanitize message open named pipe for writing (ok, if this open fails seriously, we are in big trouble) (true for many other such operations) (to be discussed what's best failure handling for this) if pipe is open, (safe) write the event message (the safe means, in loop and checking for success) exit 0 netlink reader: open named pipe (for failures here have already added an option) (will spawn a given script with failure reason) (or otherwise retries some times, then die) open netlink socket in an endless loop wait for messages arriving sanitize message (safe) write the event message into the pipe fifosvd: create named pipe (fifo) open fifo for reading and writing in none blocking mode in an endless loop wait for data arriving in pipe (poll) spawn the parser process redirecting stdin from fifo wait for exit of the spawned process if not exited successfully spawn the given failure script with arguments if failure script exits unsuccessful, then die parser: read conf file into memory table while read next message from stdin with timeout sanity checks of message (paranoia) lookup device entry in memory table do required operation for the message Is this better for you? I really hate code hacking before I'm able to finish planing. If people have existing systems using hotplug-helper mdev, you can't just change the implementation on them in a way that requires extra configuration. Which extra configuration? Everyone who has commented on this thread so far agrees with that. You definitely misunderstand my approach and how it works! -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
Hi, my original intention was to replace mdev with an updated version, but as there where several complains which told they like to continue using the the old suffering mdev, I'm thinking about an alternative. ... but the essential argument came from James, fail save operation. In my last summary I used the name xdev, just as a substitution to distinguish to current mdev. I don't think the name xdev would be a good choice to stay with, neither is nldev, as my approach allows to include / used different mechanisms, ... ... so what would be the best choice for a name of this updated implementation of the device system management command? With a different name, we can just leave the mdev code as is, and in a new applet with different code (no code sharing except usual libbb). Then you can opt in whichever version you like, or even both and chose during runtime which one to use. ... but if now someone complains about the big code size overhead when two device managers get included without code sharing, I send him an: kill -s 9 -1 ... and I won't later to change the name of the new xdev implementation to the name mdev, because someone complains and want to use newer version but stay at the name mdev. So call the command xdev for now, to distinguish to current mdev operation: xdev -i - do the configured initial setup stuff for the device file system (this is optional, but I like the one-shot startup idea) xdev -f - starts up the fifo manager, if none running (manual use is special purpose only) xdev -k - disable hotplug handler, kill possibly running netlink daemon (for internal and special purpose usage) kill is not perfect yet, race condition when switching mechanisms (needs more thinking) xdev -p (changed the parameter due to criticism) - select the kernel hotplug handler mechanism (auto include -f and -k) xdev -n - select the netlink mechanism and start daemon (auto include -f and -k) xdev -c - do the cold plug initiation (triggering uevents) (also auto include -f) xdev -s - do the cold plug as mdev -s does (also auto include -f) xdev (no parameter) - can be used as kernel hotplug helper xdev netlink - the netlink reader daemon (this is for internal use) xdev parser - the mdev.conf parser / device operation process (this is for internal use) Command parsing will be stupid, if not an option, the first character is checked, so xdev pumuckl is same as xdev parser. Where each of the mentioned parts except the fifo startup can easily be opted out on config, but otherwise add waste only some bytes in then binary. The fifo manager (named pipe supervisor) daemon itself is not included in this list, as a general fifosvd as separate applet seams to be the better place (just used internal by -f). current mdev -s may get either (other combinations possible) xdev -s= do sys file scanning as mdev -s, but use new back end xdev -pc = kernel hotplug mechanism, trigger the cold plug (uses xdev as hotplug handler) xdev -nc = netlink mechanism, trigger the cold plug (starts xdev netlink reader daemon) The only other change in the init scripts shall be to remove the old setting of mdev as hotplug helper in the kernel completely (done implicitly by -p). All those may be combined with -i, then at first the configured setup operations are performed, thereafter the other requested actions. That does *not mean* xdev -i does do any binary encoded setup stuff. It shall read the config file (suggestion is first try /etc/xdev-init.conf, then fallback to /etc/xdev.conf when former not exist) and invoke required operations for the configured setup lines. The setup lines are only used in xdev -i and otherwise ignored by xdev parser (like comments). ... more brain storming: Just for those who may need such a feature: If you start the fifo supervisor manually, you can arrange to startup a different back end parser. This may be used to send all device event messages to a file: #!/bin/busybox sh tee -a /dev/.xdev-log | exec xdev parser When this wrapper is used as back end, it will catch and append a copy of each event message to /dev/.xdev-log, which could itself be a named pipe to put messages in a file and watch file size to rotate files when required. ... but thinking of adding an xdev -l[LOG_FILE], which overwrite -f and setup the fifo supervisor to do the logic of the above wrapper, but without invoking an extra shell. Some neat trick: xdev -l/dev/tty9 -pc start fifo supervisor set xdev as hotplug helper trigger the cold plug events beside normal parsing a copy of all event messages is written to tty And again: This is not for normal usage, only for debugging purposes and those interested in lurking at their device messages. ... as a may be: xdev -e[FAILURE_SCRIPT] spawn the failure script, when an operation has serious problems
Re: RFD: Rework/extending functionality of mdev
On 13.03.2015 21:43, Laurent Bercot wrote: Except you have to make the writes blocking, which severely limits what the writers can do - no asynchronous event loop. For a simple cat-equivalent between the netlink and a fifo, it's enough, but it's about all it's good for. And such a cat-equivalent is still useless. I'm using none blocking I/O and expect handling of failure situations, e.g writing into pipe fails with EAGAIN: poll wait until write possible with timeout. then redo write It's still a very bad idea to allow writes from different sources into a single fifo when there's only one authoritative source of data, in this case the netlink. Ohps!? one netlink daemon - pipe - parser or many hotplug helper - pipe - parser If a process wants to read uevents from a pipe, it can simply read from an anonymous pipe, being spawned by the netlink listener. That's what my s6-uevent-listener and Natanael's nldev do, and I agree with you: there's simply no need to introduce fifos into the picture. Laurent, you still stuck on netlink! Using named pipe is requirement for the hotplug helper stuff, how else should they get access to the pipe, when not using a named pipe? And what is the difference between fifos and pipes? from man 7 pipe: ---snip--- Pipes and FIFOs (also known as named pipes) provide a unidirectional interprocess communication channel. A pipe has a read end and a write end. Data written to the write end of a pipe can be read from the read end of the pipe. ---snip--- You see? So whey is pipe good (your choice) and fifo bad (mine)? They differ only in how to get access to the descriptors. -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 11/03/2015 08:45, Natanael Copa wrote: The netlink listener daemon will need to deal with the event handler (or parser as you call it) dying. I mean, the handler (the parser) could get some error, out of memory, someone broke mdev.conf or anything that causes mdev to exit. If the child (the handler/parser) dies a new pipe needs to be created and the handler/parser needs to be re-forked. With that in mind, wouldn't it be better to have the timer code in the handler/parser? When there comes no new messages from pipe within a given time, the handler/parser just exists. I've thought about that a bit, to see if there really was value in making the handler exit after a timeout. And it's a lot more complex than it appears, because you then get respawner design issues, the same that appear when you write a supervisor. What if the handler dies too fast and there are still events in the queue ? Should you respawn the handler instantly ? That's exactly the kind of load you're trying to avoid by having a (supposedly) long-lived handler. Should you wait for a bit before respawning the handler ? How long are you willing to delay your events ? It is necessary to ask these questions, and have the mechanisms in place to handle that case - but the case should actually never happen: it is an admin error to have the event handler die to fast. So it's code that should be there but should never be used; and it's already there in the supervisor program that should monitor your netlink listener. So my conclusion is that it's just not worth it to allow the event handler to die. s6-uevent-listener considers that its child should be long-lived; if it dies, s6-uevent-listener dies too with an error message. It will be picked up by its supervisor (and the new instance will also respawn the handler). I'm happy to trade a bit of swap space for a significant decrease in the amount of required code. -- Laurent ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 13.03.2015 00:05, Michael Conrad wrote: On 03/12/2015 04:32 PM, Harald Becker wrote: On 12.03.2015 19:38, Michael Conrad wrote: On 3/12/2015 12:04 PM, Harald Becker wrote: but that one will only work when you either use the kernel hotplug helper mechanism, or the netlink approach. You drop out those who can't / doesn't want to use either. ...which I really do think could be answered in one paragraph :-) If the netlink socket is the right way to solve the forkbomb problem that happens with hotplug helpers, then why would anyone want to solve it the wrong way? I don't understand the need. To clarify, Adding in here #0 - to not forget cold plug and semi automatic handling 1 - kernel-spawned hotplug helpers is the traditional way, 2 - netlink socket daemon is the right way to solve the forkbomb problem ACK, but #2 blocks usage for those who like / need to stay at #1 / #0 3 - kernel-spawned fifo-writer, with fifo read by hotplug daemon is solve it the wrong way. NO! This is splitting operation of a big process in different threads, using an interprocess communication method. Using a named pipe (fifo) is the proven Unix way for this ... and it allows #2 without blocking #1 or #0. ohh, good question! ... ask Isaac! (answer in one paragraph?) What I hear Isaac say is leave #1 (traditional way) alone. I want to keep using it. I agree with him that it should stay. But I would choose to use #2 if it were available. I am asking the purpose of #3. The purpose of #3 is splitting the hotplug handler needed for traditional #1 from the suffering part, putting this with some rearranging into a separate process and use a proven interprocess communication methode (IPC). Now you may use whichever mechanism you like, and whichever method you chose, it will benefit from overall speed improvement (as events tend to arrive in bursts, and there is no more extra parsing for the 2nd, and following events). That's it. I try to provide the work to step to #3, allowing everybody to use the mechanism he likes, Isaac on the other hand block any innovation, forcing me and others to either stay at #1 too, or chose a different / external program (with consequence of code doubling or complex code sharing, the opposite to clarity). So I think your answer to my original question is the fifo design is a way to have #1 and #2 without duplicating code. The fifo design is a proven method to split operation of complex processes into smaller threads, using an interprocess communication method. In that case, I would offer this idea: All you do, is throwing in complex code sharing and the need to chose a mechanism ahead at build time, to allow for switching to some newer stuff ... but what about pre-generated binary versions, which mechanism shall be used in the default options, which mechanism shall be offered? With netlink active (sure the proven and better way for the job), you hit those like Isaac. With netlink disabled, spreading newer technology to the wide is usually blocked (don't talking about some experts who know how to build there own BB version). So why not allowing some innovation, to let the user chose which mechanism to use? What is wrong with this intention? I neither want to reinvent the wheel, nor go the udev way to create a big monolithic block, but I like to get the ability to setup the system the way I like, without blocking others to use the plug mechanism they like. -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 13.03.2015 10:30, Guillermo Rodriguez Garcia wrote: There are many configuration options in BB that must be defined at build time. I don't see why this one would be different. You can activate both as the default (with cost of some byte overhead of code size), and let the user of the binary decide which mechanism he prefers, or even flip temporarily (without system interruption). Users that want a functional solution will not probably care much about the underlying implementation. Exactly that means, using only one mechanism, is forcing those users to do it in a specific way, with all sort of consequences. Those who want to tailor BB to fit their preferences most likely don't have a problem with building their own BB. Ok, and what's than wrong with my intended approach? You will be able to opt out most of the parts, if you like, even the parser / handler (think of you want to handle device management in a script without reinventing the event plug mechanism). It is a modular system, just tie those functions together you like. A device handler could be: #!/bin/sh while read -tTIMEOUT message do # now split the received message ... # and setup your device node entries ... done exit 0 ... or think vise versa: Let someone find a new, ultra solution mechanism, but still want to use the conf parser / device handler back end. So opt out the plug mechanisms and let in the parser, then use a small external program with the ultra new solution mechanism (until it may be added as another optional mechanism in Busybox, like netlink). A modular system, put together the required parts. Only cave-eat should be, you need to fire up some kind of a service daemon for the device management system ... else this start of the service daemon (fifo manager is just another name for it) needs to be coupled with some other part ... which hit Laurent's wishes, poking me for clarity and functional separation. ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
2015-03-13 11:19 GMT+01:00 Harald Becker ra...@gmx.de: On 13.03.2015 10:30, Guillermo Rodriguez Garcia wrote: There are many configuration options in BB that must be defined at build time. I don't see why this one would be different. You can activate both as the default (with cost of some byte overhead of code size), and let the user of the binary decide which mechanism he prefers, or even flip temporarily (without system interruption). Sure. But this same argument could also be applied to many other options in BB which currently are defined at build time. I am just saying that in most other areas, Busybox does not work like this. Users that want a functional solution will not probably care much about the underlying implementation. Exactly that means, using only one mechanism, is forcing those users to do it in a specific way, with all sort of consequences. I understand your argument. You are saying that users should be able to choose at runtime. What I say is that my impression is that most users belong to one of the following two groups: Those who don't really care, and those who are happy making this choice at build time. Guillermo Rodriguez Garcia guille.rodrig...@gmail.com ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On Thu, 12 Mar 2015 17:26:38 + Isaac Dunham ibid...@gmail.com wrote: On Thu, Mar 12, 2015 at 04:04:41PM +0100, Harald Becker wrote: No, you misunderstand. Read my proposal below and tell me why this won't do what you're after, OTHER than the way mdev works now is broken/wrong, since that *isn't* universally accepted. As you stipulated about your design, applet and option names can be changed easily; but when I say new applet, I mean to indicate that this should be separate from mdev. * mdev (no options) ~ as it works now (ie, hotplugger that parses mdev.conf itself) * mdev -s (scan sysfs) ~ as it works now, or could feed the mdev parser yes. * mdev -i/-p (read events from stdin) mdev parser, accepting a stream roughly equivalent to a series of uevent files with a separator following each event[1]. To make it read from a named pipe/fifo, the tool that starts it could use dup2(). Yes. this is an option. I am not convinced it is the best solution though. I think that the changes in mdev might be more intrusive than mdev maintainer feel comfortable. Still I don't have any better ideas. While solving this, I would also like to find a way to load the MODALIAS without forking. One option is to add modprobe -i which reads modaliases from stdin with a timeout. Maybe that is a lower hanging fruit than mdev -i? * new applet: nld Netlink daemon that feeds mdev parser. I have implemented a proof-of-concept for this in case someone want experiment with mdev -i: http://git.alpinelinux.org/cgit/ncopa/nldev/tree/nldev.c I am not convinced that this should be implemented as busybox applet. It would be nice if it was though. * new applet: fifohp Your hotplug helper, fifo watch daemon that spawns a parser, and hotplug setup tool. I had actually thought that it might work at least as well if, rather than starting a daemon at init, the fifo hotplugger checks if there's a fifo and *becomes* the fifo watch daemon if needed. Also, I was thinking in terms of writing to a pipe because that lets us make sure events get delivered in full (ie, what happens if mdev dies halfway through reading an event?) Yes. Partially written/read events needs to be handled properly, as Laurent pointed out too. This way, + mdev is the only applet parsing mdev.conf; + all approaches to running mdev are possible; + it's easy to switch from mdev to the new hotplugger, while still having mdev available if the new hotplugger breaks; + mdev is only responsible for tasks that involve parsing mdev.conf. And people who want the change don't have to do more than your proposal would require. This is the direction I want go, yes. [1] The format proposed by Laurent uses \0 as an line terminator; I think it might be better to use something that's more readily generated by standard scripting tools from uevent files, which would make it possible to use cat or env to feed the mdev parser. I liked the \0 as event terminator. Its simple. -nc ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
Hi Harald, 2015-03-13 8:25 GMT+01:00 Harald Becker ra...@gmx.de: On 13.03.2015 00:05, Michael Conrad wrote: In that case, I would offer this idea: All you do, is throwing in complex code sharing and the need to chose a mechanism ahead at build time, to allow for switching to some newer stuff ... but what about pre-generated binary versions, which mechanism shall be used in the default options, which mechanism shall be offered? With netlink active (sure the proven and better way for the job), you hit those like Isaac. With netlink disabled, spreading newer technology to the wide is usually blocked (don't talking about some experts who know how to build there own BB version). So why not allowing some innovation, to let the user chose which mechanism to use? What is wrong with this intention? I neither want to reinvent the wheel, nor go the udev way to create a big monolithic block, but I like to get the ability to setup the system the way I like, without blocking others to use the plug mechanism they like. Michael's proposal would allow you to do what you want to do, since you are one of those experts who know how to build their own BB version. So what's wrong with his proposal? Guillermo ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On Thu, 12 Mar 2015 17:31:36 +0100 Harald Becker ra...@gmx.de wrote: Every gathering part grabs the required information, sanitizes, serializes and then write some kind of command to the fifo. The fifo management (a minimalist daemon) starts a new parser process when there is none running and watches it's operation (failure handling). If you are talking about named pipes (created by mkfifo) then your fifo approach approach will break here. Once the writing part of the fifo (eg a gathering part) is done writing and closes the writing part, the fifo is consumed and needs to be re-created. no other gathering part will be able to write anything to the fifo. ??? You don't understand operation of named pipes! Any program (with appropriate rights) may open the named pipe (fifo) for writing. As long as data for one event is written in one big chunk, it won't interfere with possible parallel writers. If the fifo device is closed by a writer, this does not mean it vanishes, just reopen for writing and write next event hunk). What I meant was that reader needs to reopen it too. Basically what you describe as a fifo manager sounds more like a bus like dbus. It is the old and proven inter process communication mechanism of Unix, nothing new. And fifo manager sounds really big, it is a really minimalistic daemon, with primitive operation (does never touch the data in the fifo, nor even read that data). It's main purpose beside creating the fifo, is to fire up a conf parser / device operation process when required, and to react on failure of them (spawn a script with args). and connect the parser to the fifo? my point is that you need a minimalist daemon that is always there. Why not let that daemon listen on netlink events instead? Since it is so simple as you say, why not write a short demo code? May help me understand what you really mean. -nc ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
2015-03-13 11:54 GMT+01:00 Harald Becker ra...@gmx.de: On 13.03.2015 11:25, Guillermo Rodriguez Garcia wrote: I understand your argument. You are saying that users should be able to choose at runtime. What I say is that my impression is that most users belong to one of the following two groups: Those who don't really care, and those who are happy making this choice at build time. Putting peoples in either-or-categories is not very handy, humans are to different, and you won't predict the exact needs of the next person coming around. So is either blocking any innovation or forcing one half of your either-or-group to do things in a specific way (if they doubt or not), your way? Not mine! Perhaps my message is not getting across. What I am saying is that I am not sure your suggestion is actually something others would find useful. Other than the fact that you like it yourself I don't see a lot of enthusiasm around about it. Guillermo ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 3/13/2015 9:48 AM, Harald Becker wrote: On 13.03.2015 12:29, Michael Conrad wrote: On 3/13/2015 3:25 AM, Harald Becker wrote: 1 - kernel-spawned hotplug helpers is the traditional way, 2 - netlink socket daemon is the right way to solve the forkbomb problem ACK, but #2 blocks usage for those who like / need to stay at #1 / #0 In that case, I would offer this idea: All you do, is throwing in complex code sharing and the need to chose a mechanism ahead at build time, to allow for switching to some newer stuff ... but what about pre-generated binary versions, which mechanism shall be used in the default options, which mechanism shall be offered? Please review it again. My solution solves both #1 and #2 in the same binary, with no code duplication. At first complex code reusage, and then: How will you do it without suffering from hotplug handler problem as current mdev? I'm don't seeing, that you try to handle this problem. My solution is to enable kernel hotplug handler mechanism to also benefit and avoid that parallel parsing for each event. ... beside that, this close / open_netlink look suspicious, looks like possible race condition. I thought pseudocode would be clearer than English text, but I suppose my pseudocode is still really just English... Maybe some comments will help. The new code would not be run like a hotplug helper, it would be run as a daemon, probably from a supervisor. But the old code is still there and can still be run as a hotplug helper. mdev_main() { read_options(); load_config(); // if user requests --netlink mode, we act like a daemon if (option_netlink) { // If --netlink-on-stdin then netlink is open for us already // if not, then we need to create our netlink socket if (!option_netlink_on_stdin) { close(0); // the new socket will now be file descriptor 0 open_netlink_socket(); } // use 'select()' to see if a new netlink message is ready. // if the user gave us a --timeout then we exit if no new // netlink message in a certain amount of time while (select([0], timeout)) { if (recv(0, message)) { // Netlink message is a list of variables. We call 'setenv' for each. apply_env_from_message(message); // Now we have all the hotplug variables. So call the old code. process_request(); } // keep running in a loop until timeout (or forever if no timeout) } } else #endif process_request(); } ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 3/13/2015 3:25 AM, Harald Becker wrote: 1 - kernel-spawned hotplug helpers is the traditional way, 2 - netlink socket daemon is the right way to solve the forkbomb problem ACK, but #2 blocks usage for those who like / need to stay at #1 / #0 In that case, I would offer this idea: All you do, is throwing in complex code sharing and the need to chose a mechanism ahead at build time, to allow for switching to some newer stuff ... but what about pre-generated binary versions, which mechanism shall be used in the default options, which mechanism shall be offered? Please review it again. My solution solves both #1 and #2 in the same binary, with no code duplication. I suggested wrapping #2 in a ifdef for the people who don't have netlink at all, such as on BSD, and also anyone who doesn't want the extra bytes. -Mike ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 3/13/2015 3:25 AM, Harald Becker wrote: This is splitting operation of a big process in different threads, using an interprocess communication method. Using a named pipe (fifo) is the proven Unix way for this ... and it allows #2 without blocking #1 or #0. Multiple processes writing into the same fifo is not a valid design. Stream-writes are not atomic, and your message can theoretically get cut in half and interleaved with another process writing the same fifo. (in practice, this is unlikely, but still an invalid design) If you want to do this you need a unix datagram socket, like they use for syslog. It is also a broken approximation of netlink because you don't preserve the ordering that netlink would give you, which according to the kernel documentation was one of the driving factors to invent it. If someone really wants a netlink solution they will not be happy with a fifo approximation of one. ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On Thu, 12 Mar 2015 19:05:01 -0400 Michael Conrad mcon...@intellitree.com wrote: In that case, I would offer this idea: Thanks. Your pseudo code makes perfect sense to me. ... 3. Then to support the ability to launch mdev connected to a netlink socket that already exists, and time out when not used, mdev_main() { read_options(); load_config(); #ifdef FEATURE_MDEV_NETLINK if (option_netlink) { if (!option_netlink_on_stdin) { close(0); open_netlink_socket(); } while (select([0], timeout)) { if (recv(0, message)) { apply_env_from_message(message); process_request(); } } } else #endif process_request(); } I have the feeling (without digging into the busybox code) that making mdev suitable for a longlived daemon will not be that easy. I suspect there are lots of error handling that will just exit. A long lived daemon would need log and continue instead. If I am right about that, then those who wants the option_netlink case will probably need run the netlink listener as separate process anyway and instead set the timeout to never. Then we end up with: mdev_main() { read_options(); load_config(); #ifdef FEATURE_MDEV_STDIN while (select([0], timeout)) { if (recv(0, message)) { apply_env_from_message(message); process_request(); } } #else process_request(); #endif } This would be enough for me. I think this will be even smaller than what you propose with the fifo. It will do netlink, it will do the traditional hotplug helper, and even allow the trick where a tiny daemon monitors netlink and can start mdev in daemon mode on demand. -Mike ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 13.03.2015 12:41, Michael Conrad wrote: On 3/13/2015 3:25 AM, Harald Becker wrote: This is splitting operation of a big process in different threads, using an interprocess communication method. Using a named pipe (fifo) is the proven Unix way for this ... and it allows #2 without blocking #1 or #0. Multiple processes writing into the same fifo is not a valid design. Who told you that? It is *the* proven N to 1 IPC in Unix. Stream-writes are not atomic, and your message can theoretically get cut in half and interleaved with another process writing the same fifo. (in practice, this is unlikely, but still an invalid design) This is not completely correct: picked out of Linux pipe manual page (man 7 pipe): ---snip--- O_NONBLOCK disabled, n = PIPE_BUF All n bytes are written atomically; write(2) may block if there is not room for n bytes to be written immediately ---snip--- As long as the message written to the pipe/fifo is less than PIPE_BUF, the kernel guaranties atomicity of the write, message mixing only happens when you write single messages PIPE_BUF size, or use split writing (e.g. do fprintf without setting to line buffer mode); PIPE_BUF shouldn't be smaller than 512, but more likely 4k as on Linux (old), or even 64k (modern). If you want to do this you need a unix datagram socket, like they use for syslog. Socket overhead is higher than writing a pipe. Not only at code size, much more like at CPU cost passing the messages. It is also a broken approximation of netlink because you don't preserve the ordering that netlink would give you, which according to the kernel documentation was one of the driving factors to invent it. Sure. You say netlink is the better solution, I say netlink is it, but next door you may find one who dislike netlink usage. We are not living in a perfect world. Ordering is handled different in mdev, that shall stay as is. My approach can't solve every single problem in this method, but that is up to those who like to stay, still they should gain from the speed improvement, and less problems from race conditions (each device operation is done without mixing with other device operation, as in pure parallelism). Additionally the hotplug helper speed is increased and does an really early exit compared to current mdev (or your approach). This should reduce system pressure and event reordering, but will indeed not avoid (needs to be synchronized) ... but I got a different idea: I heard about, the kernel provide a sequence number, which is used in mdev to do synchronization. May be we should just send the messages to the pipe as fast as posible, but prefix them with the event sequence number. The parser reads the message and checks the sequence number and pushes reordered messages in a back list, until right message receives (or some timeout - as done in mdev, but does not need reading / writing a file). Oh I think the sequence number info is in the docs/mdev.txt description, including how this is done in mdev. If someone really wants a netlink solution they will not be happy with a fifo approximation of one. You missed the fact, my approach allows for free selection of the mechanism. C hosing netlink means using netlink, as it should be. The event listener part is as small as possible and write to the pipe, which fire up a parser / handler to consume the event messages. Where is there an approximation? Kernel hotplug helper mechanism is a different method, but also available for those who like to use them. Either one will have only some unused code part (if not opted out on config). The difference is, default config can include both mechanisms in pre-build binaries. The user can chose and test the mechanism he wants, and then possibly build a specific version and opt out unwanted stuff. -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
There are interesting technical points in this discussion, but it turns out to be mostly about philosophy and frustration. Harald, there are two points in your arguments which make no sense to me: Le 12/03/2015 17:31, Harald Becker a écrit : ... because there are people, wo dislike using netlink and want use the kernel hotplug helper mechanism. That's it. Peoples preferences are different. Opt out the functions you dislike in BB config. Hotplug is KISS, it is stupid, maybe, but it is so simple that you can probably do the job with a script. The same serialization you propose to implement in user space by the mean of several processes, a named pipe and still the fork bomb, has been implemented in the kernel without the fork bomb: it is called netlink. These people you are talking of, who would like to see hotplug serialized but do not want netlink, do they really exist? This set of people is most likely the empty set. In case these really exist, then they must be idiots, and then, well, should Busybox support idiocy? Le 11/03/2015 19:02, Harald Becker a écrit : It is neither a knowledge nor any technical problem, it is preference: I want to have *one* statical binary in the minimal system, and being able to run a full system setup with this (system) binary (or call it tool set). I agree it's fun to have all tools in one static binary. But I dont see any serious reason to make it an absolute condition. You speak of *preference*, but this very one looks pretty futile. I don't see the problem with having even a dozen applications, all static, why not, I'm also a fan of static linking. Best regards. Didier ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 13.03.2015 12:29, Michael Conrad wrote: On 3/13/2015 3:25 AM, Harald Becker wrote: 1 - kernel-spawned hotplug helpers is the traditional way, 2 - netlink socket daemon is the right way to solve the forkbomb problem ACK, but #2 blocks usage for those who like / need to stay at #1 / #0 In that case, I would offer this idea: All you do, is throwing in complex code sharing and the need to chose a mechanism ahead at build time, to allow for switching to some newer stuff ... but what about pre-generated binary versions, which mechanism shall be used in the default options, which mechanism shall be offered? Please review it again. My solution solves both #1 and #2 in the same binary, with no code duplication. At first complex code reusage, and then: How will you do it without suffering from hotplug handler problem as current mdev? I'm don't seeing, that you try to handle this problem. My solution is to enable kernel hotplug handler mechanism to also benefit and avoid that parallel parsing for each event. ... beside that, this close / open_netlink look suspicious, looks like possible race condition. What es wrong with splitting a complex job into different threads? The splitting alone, with inserting the named pipe (a long proven IPC), is enough to let even kernel hotplug mechanism based system, to gain speed improvement (and expected less memory usage on system startup). On modern multi core machines this will also allow the split operations to run on different CPU cores, with no extra cost. Synchronized operation. Where your Solution still hold the possibility of race conditions from possible parallelism. I suggested wrapping #2 in a ifdef for the people who don't have netlink at all, such as on BSD, and also anyone who doesn't want the extra bytes. Therefor is my approach to allow for opt out in the config, but brings me otherwise to the idea to throw a compiler error, when netlink is build on system where not available, or optionally a warning and auto disable netlink support (usually a 4 liner snipped at the code start). #if CONFIG_FEATURE_MDEV_NETLINK NETLINK_NOT_AVAILABLE #define CONFIG_FEATURE_MDEV_NETLINK 0 #warn This system lacks netlink support, netlink disabled #endif ... or something similar. -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 13.03.2015 14:20, Didier Kryn wrote: There are interesting technical points in this discussion, but it turns out to be mostly about philosophy and frustration. ACK :( Hotplug is KISS, it is stupid, maybe, but it is so simple that you can probably do the job with a script. The same serialization you propose to implement in user space by the mean of several processes, a named pipe and still the fork bomb, has been implemented in the kernel without the fork bomb: it is called netlink. You mixed some things, may be due to my poor English: - current mdev suffers due to parallel reparsing conf for every event - for those who like to stay at kernel hotplug mechanism, my approach gives some benefits, but will not solve every corner case; but it looks like, I could extend the approach somewhat, to do easier serialization (this needs some more checking). - for those who want to use netlink, it is a small long lived netlink reader, pushing the event messages forward to the central back end (who frees resources when idle). That shall work as a netlink solution should So where is your concern? Using a pipe for communication from to another process? This is Unix IPC / multi threading. Nothing else. These people you are talking of, who would like to see hotplug serialized but do not want netlink, do they really exist? This set of people is most likely the empty set. In case these really exist, then they must be idiots, and then, well, should Busybox support idiocy? As soon as you can proof, set set of users is empty or hold only a dropable minority, we can set default config for the kernel hotplug mechanism to off, so it will be excluded from pre-build binaries. When nobody more complains. That's it, you get a netlink solution. I agree it's fun to have all tools in one static binary. But I dont see any serious reason to make it an absolute condition. You speak of *preference*, but this very one looks pretty futile. I don't see the problem with having even a dozen applications, all static, why not, I'm also a fan of static linking. I explained it already in the other thread to Laurent. It is my way. I try to avoid forcing others to do things in a specic way, but I hat to be forced by others. Busybox is a public tool set and shall provide the tools, which allow the user / admin to setup the system as he like. My approach is to let others use kernel hotplug mechanism, if they lik, but still gain performance boost, and users who like to use netlink, get a netlink solution. The cost is some unused bytes in the pre-build binaries (may be opted out in build config). So where do I fail? Neither the optional event gathering parts (which will try to stay as fast / small as possible), nor the parser / device operation handler does work different than before (except some code reordering to avoid parsing conf file for each event). The job mdev does has just got split up in different threads, using a proven interprocess communication technique (IPC). Again, where do I fail? -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 13.03.2015 15:46, Michael Conrad wrote: I thought pseudocode would be clearer than English text, but I suppose my pseudocode is still really just English... Maybe some comments will help. You can't fix the suffering of your code with some comments ... beside that, it looks like you dropped an #ifdef. The new code would not be run like a hotplug helper, it would be run as a daemon, probably from a supervisor. But the old code is still there and can still be run as a hotplug helper. The new code behaves exactly as the old code. When used as a hotplug helper, it suffers from parsing conf for each event. My approach is a splitting of the old mdev into two active threads, which avoid those problems even for those who like to stay at kernel hotplug. ... and those who like to use netlink, can chose netlink and get netlink, using the same back end as the kernel hotplug. So where I am wrong? What is the reason for your concern? Using a pipe as IPC? That fifo supervisor? What does my approach not do, you need (except completely staying at the old, suffering code)? -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
Interrupts ... no trouble, everybody agree you need only one unblocked interrupt source, but never ask for the detail which one ... :) Hi Laurent ! I'm sorry if I came across as dismissive or strongly opposed to the idea. It was not my intent. My intent, as always, is to try and make sure that potential new code 1. is worth writing at all, and 2. is designed the right way. I don't object to discourage you, I object to generate discussion. Which has been working so far. ;) ACK I understand your point. If I don't modify my mdev.conf, everything will still work the same and I can script the functionality if I prefer; but if you prefer centralizing the mounts and other stuff, it will also be possible to do it in mdev.conf. This is my primary intention. It is very reasonable. The only questions are: - what are the costs of adding your functionality to mdev.conf ? - are the benefits worth the costs ? I don't know the exact cost ahead, but they should be not massive. look, we need an mkdir and symlink plus setting owner/permissions, and we need to setup an arg vector to call mount. The rest will be some rework of the parser, but I don't consider this into cost. So this shouldn't be much of increasing for the extended syntax. Ok may be an include option in mdev.conf has some extra cost ... but benefit would be for all who like to split mdev.conf into separate files. Could be a BB config option allow include in mdev.conf. Some more cost will result from possibility to use netlink, but the benefit will be not parsing the table for every event. So some cost is acceptable for me. Either part hotplug handler oder netlink reader may be excluded on BB config. I see no trouble grouping the code and excluding it when the option is deselected. Minor changes may persist, but shall not blow out the principles Busybox is based on, else I do have a big trash can ... As a preamble, let me just say that if you manage to make your syntax extensions a compile-time option and I can deactivate them to get the same mdev binary as I have today, then I have no objection, apart from the fact that I don't think it's good design - see below. Deactivating unwanted stuff in BB config is my intention. Deactivation will not result in same binary, due to some intended parser changes, but cost for this shouldn't be much notable ... ... and yes, I know to be picky on size restricted development. I started my programming practice on an 8008 with 512 Byte ROM and 128 Byte RAM ... in addition to a Zuse Z31 computer (one of the first computer models using transistors) with 2000 words magnetic core memory of each 11 decimal digits ... :) My point, which I didn't make clear in my previous message because I was too busy poking fun at you (my apologies, banter should never override clarity), is that I find it bad design to bloat the parser for a hotplug helper configuration file in order to add functionality for a /dev initialization program, that has *nothing to do* with hotplug helper configuration. Ok, here we agree. I would highly deprecate blowing up the hotplug scanning, but here we come to the reason I stumbled about and asked if we can avoid this extra parsing all together, speeding up all. So I was back at netlink. The confusion between the two comes from two places: - the applet name, of course; I find it unfortunate that mdev and mdev -s share the same name, and I'd rather have them called mdev and mdev-init, for instance. ACK, this could be done, for all the functionality ... but this will be a philosophical discussion. As mdev the hotplug helper does not use the code of mdev -s, there is no time cost to have that in one binary. - the fact that mdev's activity as a hotplug helper is about creating nodes, fixing permissions, and generally doing stuff around /dev, which is, as you say, logically close to what you need to do at initialization time, so at first sight the configuration file looks like a tempting place to put that initialization time work. But I maintain that mdev.conf should be for mdev, not for mdev -s. mdev -s is just doing the equivalent of calling mdev for every device that has been created since boot. If you make a special section in mdev.conf for mdev -s, this 1. blurs the lines even more, which is not desirable, and 2. bloats the parser for mdev with things it does not need after the first mdev -s invocation. To 1. - ACK, see below To 2. - as I intent to avoid parsing on every event, that is I like to parse all rules into memory table and then scan only memory table for each arriving event, The extra code in the parser does not matter to your concern. Beside this I tried to chose syntax carefully, to produce not much overhead. We have two checks, one to the last char of the regex (slash means directory, atsign means symlink - ignored on hotplug) and the second check is to the percent sign of the mount file system type.
Re: RFD: Rework/extending functionality of mdev
On Wed, 11 Mar 2015 14:30:13 +0100 Harald Becker ra...@gmx.de wrote: Hi Natanael ! Hi Isaac ! Looks like you misunderstand my approach for the mdev changes ... ok, may be I explained it the wrong way, so lets go one step back and start with a wider view: IMO Busybox is a set of tools, which allow to setup a working system environment. How this setup is done, is up to the distro / system maintainer, that is it is up their preferences. I really like the idea to have an optimized version using netlink for the hotplug events and avoid unnecessary forking on each event, but there are other people which dislike that approach and prefer to use the hotplug handler method, or even ignore the hotplug feature completely and setup device nodes only when required (semi automatic, not manual mknod, but manual invoking of mdev). The world is not uniform, where all people share the same preferences, so we need to be polite to accept those different kinds of preferences and don't try to force someone to setup their system in a specific way. Right? ... else we would be at the end of discussion and the end of my project approach :( ... but I think you will agree: As Busybox is the used tool set, it shall provide the tools for all users, and shall not try to force those users to use netlink, etc. So netlink listener should not be implemented in mdev. I agree on that. ... My idea is a fifo approach. FIFO = first-in-first-out I assume that you are talking about named pipes (aka fifos) http://en.wikipedia.org/wiki/Named_pipe This allows splitting the device management functionalities. Nevertheless which approach to gather the device information is used, the parser and device handling part can be shared (even on a mixed usage scenario). So we have the following functional blocks for our device management: - initial setup of the device file system environment (yes, can be done by shell scripting, but it is a functional block) - starting the fifo management and automatic parser invocation (long lived minimalistic daemon) - manual scanning of the sys file system and gathering device info - setting up the usage of the hotplug helper system (check fifo availability and set hotplug helper in kernel) - an hotplug helper spawned by the kernel on every event (should be as small / fast as possible) - a netlink based event receiptor (long lived daemon, small memory foot print) Why do you need a hotplug helper spawned by kernel when you have a netlink listener? The entire idea with netlink listener is to avoid the kernel spawned hotplug helper. It simply does not make sense to have both. - the device node handling part (conf table parser / calling required operation) Where the gathering parts may be used according to the user preferences (and may be opted out on BB configuration). Every gathering part grabs the required information, sanitizes, serializes and then write some kind of command to the fifo. The fifo management (a minimalist daemon) starts a new parser process when there is none running and watches it's operation (failure handling). If you are talking about named pipes (created by mkfifo) then your fifo approach approach will break here. Once the writing part of the fifo (eg a gathering part) is done writing and closes the writing part, the fifo is consumed and needs to be re-created. no other gathering part will be able to write anything to the fifo. Basically what you describe as a fifo manager sounds more like a bus like dbus. I think you are on wrong way. Sorry. -nc ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On Wed, Mar 11, 2015 at 7:02 PM, Harald Becker ra...@gmx.de wrote: On 11.03.2015 16:21, Laurent Bercot wrote: I don't understand that binary choice... You can work on your own project without forking Busybox. You can use both Busybox and your own project on your systems. Busybox is a set of tools, why should it be THE set of tools ? Sure, I know how to do this, I started creating adapted Busybox versions to specific needs for a minimalistic 386SX Board. Around 1995, or so ... wow, long time now :) It is neither a knowledge nor any technical problem, it is preference: I want to have *one* statical binary in the minimal system, and being able to run a full system setup with this (system) binary (or call it tool set). All I need then is the binary, some configs, some scripts (and may be special applications). I even go so far, to run a system with exactly that one binary only, all other applications functions are done with scripting (ash, sed, awk). Sure those are minimalist (dedicated systems), but they may be used in a comfortable manner. I even started a project to create a file system browser (comparable to Midnight Commander, with no two pane mode but up to 9 quick switch directories), using only BB and scripting. All packed in a (possibly self extracting) single shell script. The only requirements to run this, is (should be) a working BB (defconfig) environment, usual proc, sys, dev, setup and a writable /tmp directory (e.g. on tmpfs). The work for this was half way through to first public alpha, then Denys reaction on a slight change request was so frustrating that I pushed the project into an otherwise unused archive corner, and stopped any further development. I'm not sure how heavily mdev [-s] relies on libbb and how hard it would be to extract the source and make it into a standalone, easily hackable project, but if you want to test architecture ideas, that's the way to go - copy stuff and tinker with it until you have something satisfying. I always did it this way, and never posted untested stuff, except some snippets when one asked for something and I quickly hacked something for an answer (with mark set as untested). ... if not, you still have your harald-mdev, and you can still use it along with Busybox - you'll have two binaries instead of one, but even on whatever tiny noMMU you can work on, you'll be fine. Sure, I could have two, three, four, ten, twenty, hundred, ... programs, but my preference is to have *one* statically linked binary for the complete system tool set (on minimal systems). So why don't you write such a binary wrapping busybox then and other things? I think KISS principle still ought to be alive these days. Not to mention, Denys already cannot cope with the maintenance the way that it would be ideal. For instance, some of my IMHO serious bugfixes are uncommented. Putting more and more stuff into busybox would just make the situation worse and more frustrating, sorry. I really do not want busybox to follow the systemd way. On the positive side of systemd, they at least have far more resource than what Denys can offer, at least in my opinion, so ... ... Thats the reason why I dislike and don't use your system approach :( ... otherwise great work :) That does not preclude design discussions, which I like, and which can happen here (unless moderators object), and people like Isaac, me, or obviously Denys, wouldn't be so defensive - because it's now about your project, not Busybox; you have the final say, and it's totally fine, because I don't have to use harald-mdev if I don't want to. One of the things I really hate, is to force someone doing something (especially in a specific way), only topped by someone else forcing me to do something in a specific way :( ... ... so I always try to do modifications in a way which let others decide about usage, expecting not to break existing setups (at least without asking ahead if welcome). Slight modifications may happen from modifications (e.g. different parameter notion), if unavoidable, but they shall not require complete changes in system setup. -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
Hi Isaac ! On 12.03.2015 02:05, Isaac Dunham wrote: I just don't think you're quite thinking through exactly what this means. In which sense? Let me know, what I got wrong. I'm a human, making mistakes, not a perfect machine. It seems like you want to force everyone who uses mdev to use a multi-part setup. Whops, you got that definitely wrong! Who told you, I want to force you to use a different setup. My clear intention, was adding some extra flexibility, without breaking existing usage. ... but criticism arose, which poked for more clarity. Only due to this and the wish to allow fitting as much preferences of people as possible, there might result a slight change: One extra operation may be required (when not not combined / hidden in the mdev system for clarity). At the location you setup the hotplug handler or do mdev -s you either need a slight change of the command parameters or need to insert a single extra command (details on this have not been discussed yet). But that's it. Are you really concerned about inserting a single extra line in your startup scripts or having a modification to one of those lines? That way you would block any innovation, *and* the needs of other people. Specifically, you are proposing to *replace* the standard hotplugger design (where the kernel spawns one program that then sets up the device itself) with a new design where there's a hotplugger that submits each event to a longer-lived program, which in turn submits the events to a program that creates/sets up the devices. You say I propose a change of the device management system? The mdev system suffer on event bursts and this has been grumbled about several times in the past. What I'm now trying is to build a solution for this into Busybox. Not reinventing the wheel, but implementing known solutions. I am saying, don't force this design on people who want the hotplug helper that the kernel spawns to finish things up. The only way to solve this, would be to let mdev as it is, and create an additional netlink-mdev, which brings us as at the situation where it gets complicated (or at least complex) to share code and work between both of them. Selection between both (when you don't want to include both hunks), needs to be done with BB configuration, which make all thinks like pre-build binaries a mess (how many different versions shall be maintained? Only yours? Only mine?). To solve this conflict and to also give the current device management system some speed improvement on burst, I tried to find a solution. This solution centers on the most important problem, the parallelism and parsing the conf for each single hotplug event. So I *propose* to split the function of mdev into two separate parts. Part one will contain the kernel hotplug helper part (and should be as small and fast as possible), and part two the parsing of the conf and device node operations ... with the requirement of communication between any number of running part ones and the single part two. To reduce cost on those part one hunks (remember they shall be as fast as possible), and the availability of failure management (which BB currently completely lack), the communication between the parts need to be watched by a small helper daemon. A very minimalist daemon, which doesn't touch any data, it just fire up the device operation stuff, when it's required and wait until this process dies. Then goes back to wait for more events to arrive (this not like udev). Do I wish to overcome the suffering of Busybox device management? Yes. Do this need some changes? Yes, I propose as one or two line change in your startup files, with the benefit of speed improvement. Do I otherwise propose to change your system setup or flip to using netlink operation? *NO* Hence, I do not force anybody to change from using the kernel hotplug feature. I improve, that mechanism ... with the ability to add further mechanisms, without duplicating code. One shall be a netlink reader in BB, the other may be external programs of others, with a better interface to use device management functions from there. Are your really complaining against any kind of innovation? Any step to overcome long problems, discussed several times ... but dropped, mostly due to the amount of required work? Do I *propose* some innovation? Yes. Do I propose to force someone changing to different mechanism? No. Agreed. But I would include hotplug daemons under etc. I used etc. so add any similar type of program you like, including hotplug daemons ... but stop, what are hotplug daemons? Do you mean daemons like udev? udev uses netlink reading. Otherwise I know about hotplug helper programs (current mdev operation), but not about hotplug daemons. That's the best description I can come up with for the FIFO reader that you proposed that would read from a fifo and write to a pipe, monitoring the state of mdev -p Wow? You got it wrong!
Re: RFD: Rework/extending functionality of mdev
Hi ! To Michael: Don't be confused, Natanael provided an alternative version to achieve the initial device file system setup (which isn't bad, but may have it's cons for some people on small resource constrained systems of the embedded world). So I left it out for clarity ... but still may be implemented / used as an alternative method for setup. To Natanael: On 12.03.2015 10:14, Natanael Copa wrote: - The third method (scanning sys file system) is the operation of mdev -s (initial population of device file system, don't mix with mdev w/o parameters), in addition some user take this as semi automatic device management (embedded world) I disagree here. mdev -s solves a different problem. No. there are 2 different problems. 1) handle hotplug events. There are only 2 methods for this problem: A) using 1 fork per event (eg /sbin/mdev in /proc/sys/kernel/hotplug). this is simple but slow. This is mdev (without any parameters). B) reading hotplug events via netlink using a long lived daemon. (like udev does) Currently not in mdev, but shall be an alternative mechanism in my implementation, so you may chose and use the mechanism you like. 2) do cold plugging. at boot you need configure the devices kernel already know about. This is the operation of mdev -s B) solve problem 1 mentioned above (set up a hotplug handler) and then trigger the hotplug events. Now you don't need scan the sysfs at all. But still there are people in the embedded world, who like (or are forced) to use semi automatic device handling, that is calling something like mdev -s to scan sys file system for new devices. My approach is to give them all the possibility to device management withe mechanism they like, without maintaining different device management systems and duplicating the code. Three different mechanisms, with three different front end's, and one shared back end. ... the only thing I see is a difference in initial device system population. You provided a different method to trigger the coldplug events (and yes, I understand that approach), but that one will only work when you either use the kernel hotplug helper mechanism, or the netlink approach. You drop out those who can't / doesn't want to use either. I left that out in the short summary for Michael, but didn't forget your hint. May be we can have an additional alternative for the setup part, implementing event triggering (still we do the sys file system scan, but with different handling then) ... or you do it in scripting, setup your hotplug handler or netlink listener and then run a script to trigger the plug events (nice idea otherwise, I like it, but won't unnecessary drop people on the other end). What I currently do is: - for problem 1 (dealing with hotplug events) I use method A. - for problem 2 (set up devices at boot) I use method A because my hotplug handler is slow due to many forks. What I would like to do is switch to method B for both problem 1 and method B for problem 2 too. However, I want the long lived daemon consume less memory and i want be able to use the mdev.conf syntax for configuring the devices. No problem, my approach shall give the possibility to do it the way you like, without blocking the others. Still you need to do the following steps on system startup: - initial creation of the device file system (out of scope of mdev) - prepare your system for hotplug events, either kernel method or netlink (if you go that way) - trigger initial device file system population (cold plug) (may be done in two ways, yours or the old mdev -s) - So how can we have all three methods without duplication of code, plus how can we speed up the kernel hotplug method? IMHO, there are not 3 methods so the rest of discussion is useless. You are going to force others to do it the way you like and everything other is of no interest? I'm trying to give most people the possibility to do it the way they like, your way inclusive! ... without blocking other methods (or may be external implementations with reusage of conf / handling back end). ... as other functions in BB, unwanted functionalities may be opted out in build config. ... so this would mean for you: Include the back end and the netlink into your Busybox, do the usual device file system creation, activate the netlink handler (shall auto start the fifo watcher), then trigger cold plug events (undiscussed if done with script or added to binary). Anything wrong with this? -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On Thu, Mar 12, 2015 at 6:00 PM, Harald Becker ra...@gmx.de wrote: Hi Laszlo ! So why don't you write such a binary wrapping busybox then and other things? I think KISS principle still ought to be alive these days. Not to mention, Denys already cannot cope with the maintenance the way that it would be ideal. For instance, some of my IMHO serious bugfixes are uncommented. Putting more and more stuff into busybox would just make the situation worse and more frustrating, The usual way of development is: (1) Planning the work to do (the step we are discussing here) (2) Code hacking (what I will do next) (3) Preliminary Testing (also mine job) (4) Offer access to those who like (for further testing) (5) fixing complains (6) putting into main stream (or accessible by the rest) Right? So what is your complain? sorry. I really do not want busybox to follow the systemd way. Who told you, I'm trying to go that way. My intention is to overcome the mdev problems, and allow those who like to use the netlink interface. I dislike encoding any fixed functionality in a binary, and don't force anybody to use possible extensions. Laurent poked me to more clarity, which would mean to split early initialization from mdev operation, with the cave eat of a slight change in init scripts (may be one more command or some extra command parameter, could be done automatic, but than different functionalities in applet - may be more discussion required on that). Beside that, it's shall be up to the system maintainer, to chose which device management mechanism to use (BB shall provide the tools for that, small modular tools, bound together by admin - no big monolithic). On the positive side of systemd, they at least have far more resource than what Denys can offer, at least in my opinion, so ... You are talking about development resources? Here they are! I'm willing to do that job, not asking for someone else doing the work. It is not only about feature development resources, but also maintenance. Denys will be the maintainer for new busybox code as far as I am aware. I have not seen this model changing for a very long while, sadly. Once, I tried to ask for changing that model, but I was apparently just shooting myself in the foot based on the feedback. It is nice that you are trying to help and I certainly appreciate it, but why cannot you simply do that job nicely outside busybox where *you* have to be responsible for that project? It would be an explicit way of enforcing KISS and not putting more burden on Denys. He may enjoy maintaining more and more code, but in all honesty with due respect and appreciation, I do not enjoy when developers do not get response for their patches and they kind of all depend only on Denys with regards to upstreaming. It is also a bit demotivating that we do not get early feedback and then we realize that our patches are completely reworked by Denys. It is unfortunate that our work almost goes to /dev/null. I do not feel appreciated. This is all happening because of more and more complex stuff to be maintained by one person. It is not a personal offense against Denys to be fair. It is a model I very much disagree with in general. If you can convince the busybox community to split up the maintainership, perhaps that would be a completely different discussion to start with, but in all honesty, I do not like these monolythic projects. I still stick by that KISS is a good thing. If I could, I would personally replace busybox with little custom tools on my system, but I currently do not have the resource for that. Therefore, all the complexities and non-kiss that goes in is something I need to accept. I'm asking about, which preferences other people have, so I'm able to get the right decisions, before I start hacking code ... so what's wrong? Asking for feedback is good, nothing wrong in there; putting this into busybox this way is wrong on the other hand IMHO. -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
Hi Laszlo ! So why don't you write such a binary wrapping busybox then and other things? I think KISS principle still ought to be alive these days. Not to mention, Denys already cannot cope with the maintenance the way that it would be ideal. For instance, some of my IMHO serious bugfixes are uncommented. Putting more and more stuff into busybox would just make the situation worse and more frustrating, The usual way of development is: (1) Planning the work to do (the step we are discussing here) (2) Code hacking (what I will do next) (3) Preliminary Testing (also mine job) (4) Offer access to those who like (for further testing) (5) fixing complains (6) putting into main stream (or accessible by the rest) Right? So what is your complain? sorry. I really do not want busybox to follow the systemd way. Who told you, I'm trying to go that way. My intention is to overcome the mdev problems, and allow those who like to use the netlink interface. I dislike encoding any fixed functionality in a binary, and don't force anybody to use possible extensions. Laurent poked me to more clarity, which would mean to split early initialization from mdev operation, with the cave eat of a slight change in init scripts (may be one more command or some extra command parameter, could be done automatic, but than different functionalities in applet - may be more discussion required on that). Beside that, it's shall be up to the system maintainer, to chose which device management mechanism to use (BB shall provide the tools for that, small modular tools, bound together by admin - no big monolithic). On the positive side of systemd, they at least have far more resource than what Denys can offer, at least in my opinion, so ... You are talking about development resources? Here they are! I'm willing to do that job, not asking for someone else doing the work. I'm asking about, which preferences other people have, so I'm able to get the right decisions, before I start hacking code ... so what's wrong? -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On Thu, Mar 12, 2015 at 04:04:41PM +0100, Harald Becker wrote: Hi Isaac ! On 12.03.2015 02:05, Isaac Dunham wrote: I just don't think you're quite thinking through exactly what this means. In which sense? Let me know, what I got wrong. I'm a human, making mistakes, not a perfect machine. It seems like you want to force everyone who uses mdev to use a multi-part setup. Whops, you got that definitely wrong! Who told you, I want to force you to use a different setup. My clear intention, was adding some extra flexibility, without breaking existing usage. ... but criticism arose, which poked for more clarity. Only due to this and the wish to allow fitting as much preferences of people as possible, there might result a slight change: One extra operation may be required (when not not combined / hidden in the mdev system for clarity). At the location you setup the hotplug handler or do mdev -s you either need a slight change of the command parameters or need to insert a single extra command (details on this have not been discussed yet). But that's it. Are you really concerned about inserting a single extra line in your startup scripts or having a modification to one of those lines? That way you would block any innovation, *and* the needs of other people. No, you misunderstand. Read my proposal below and tell me why this won't do what you're after, OTHER than the way mdev works now is broken/wrong, since that *isn't* universally accepted. As you stipulated about your design, applet and option names can be changed easily; but when I say new applet, I mean to indicate that this should be separate from mdev. * mdev (no options) ~ as it works now (ie, hotplugger that parses mdev.conf itself) * mdev -s (scan sysfs) ~ as it works now, or could feed the mdev parser * mdev -i/-p (read events from stdin) mdev parser, accepting a stream roughly equivalent to a series of uevent files with a separator following each event[1]. To make it read from a named pipe/fifo, the tool that starts it could use dup2(). * new applet: nld Netlink daemon that feeds mdev parser. * new applet: fifohp Your hotplug helper, fifo watch daemon that spawns a parser, and hotplug setup tool. I had actually thought that it might work at least as well if, rather than starting a daemon at init, the fifo hotplugger checks if there's a fifo and *becomes* the fifo watch daemon if needed. Also, I was thinking in terms of writing to a pipe because that lets us make sure events get delivered in full (ie, what happens if mdev dies halfway through reading an event?) This way, + mdev is the only applet parsing mdev.conf; + all approaches to running mdev are possible; + it's easy to switch from mdev to the new hotplugger, while still having mdev available if the new hotplugger breaks; + mdev is only responsible for tasks that involve parsing mdev.conf. And people who want the change don't have to do more than your proposal would require. [1] The format proposed by Laurent uses \0 as an line terminator; I think it might be better to use something that's more readily generated by standard scripting tools from uevent files, which would make it possible to use cat or env to feed the mdev parser. Thanks, Isaac Dunham ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 12/03/2015 18:26, Isaac Dunham wrote: [1] The format proposed by Laurent uses \0 as an line terminator; I think it might be better to use something that's more readily generated by standard scripting tools from uevent files, which would make it possible to use cat or env to feed the mdev parser. An uevent sent by the netlink is already a series of null-terminated strings. If you want to make text-scripting tools able to process those messages, then you need to make the netlink listener convert the events. The advantages of \0 terminators is that they can't appear anywhere else in strings. Changing the terminators requires either a quoting / parsing layer, which is hard and expensive, or making assumptions on the format of the messages. It would be feasible, for now, to assume that \n does not appear in uevent strings, and to replace all instances of \0 (including my use of an extra \0 as an event terminator) with \n. But there's no guarantee that \n won't appear in a message in the future, and I'd rather avoid introducing constraints that don't need to be introduced. That's my rationale for sticking with \0. -- Laurent ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
Hi Natanael ! I assume that you are talking about named pipes (aka fifos) http://en.wikipedia.org/wiki/Named_pipe Ack, fifo device in the Linux / Unix world. Why do you need a hotplug helper spawned by kernel when you have a netlink listener? The entire idea with netlink listener is to avoid the kernel spawned hotplug helper. ... because there are people, wo dislike using netlink and want use the kernel hotplug helper mechanism. That's it. Peoples preferences are different. Opt out the functions you dislike in BB config. ... but this is vise versa, for those who chose to use the kernel hotplug mechanism. It simply does not make sense to have both. Both active at the same time? Sure! ... This is not the intention. I've been talking about the functionalities, which need to be implemented. Every gathering part grabs the required information, sanitizes, serializes and then write some kind of command to the fifo. The fifo management (a minimalist daemon) starts a new parser process when there is none running and watches it's operation (failure handling). If you are talking about named pipes (created by mkfifo) then your fifo approach approach will break here. Once the writing part of the fifo (eg a gathering part) is done writing and closes the writing part, the fifo is consumed and needs to be re-created. no other gathering part will be able to write anything to the fifo. ??? You don't understand operation of named pipes! Any program (with appropriate rights) may open the named pipe (fifo) for writing. As long as data for one event is written in one big chunk, it won't interfere with possible parallel writers. If the fifo device is closed by a writer, this does not mean it vanishes, just reopen for writing and write next event hunk). Basically what you describe as a fifo manager sounds more like a bus like dbus. It is the old and proven inter process communication mechanism of Unix, nothing new. And fifo manager sounds really big, it is a really minimalistic daemon, with primitive operation (does never touch the data in the fifo, nor even read that data). It's main purpose beside creating the fifo, is to fire up a conf parser / device operation process when required, and to react on failure of them (spawn a script with args). -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
The question I was asking was only about this: On 3/12/2015 12:04 PM, Harald Becker wrote: but that one will only work when you either use the kernel hotplug helper mechanism, or the netlink approach. You drop out those who can't / doesn't want to use either. ...which I really do think could be answered in one paragraph :-) If the netlink socket is the right way to solve the forkbomb problem that happens with hotplug helpers, then why would anyone want to solve it the wrong way? I don't understand the need. -Mike ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
Hi Laurent ! Out of curiosity: what are, to you, the benefits of this approach ? What are the benefits of preferences? ... good question!? ;) Does it actually save you noticeable amounts of RAM ? May be a few bytes ... noticeable? what is your level to noticeable here? ... otherwise I would say *NO* of disk space ? Which disk space? ... which disk? ... looking around ... not seeing a disk (in your sense) ... shot: complete system from initramfs ... disk in this sense is some external USB data storage (only data, never used for system purposes). Is it about maintenance - just copy one binary file from a system to another ? Hit! (But you'd also have to copy all your scripts...) Second file, a tar, but all architecture independent, third file, a tar, with pre-configured setups. Is it about something else ? Yes, the most important! ... It's my way, the way I did it for several commercial projects ... and the way I like to do it ... clarity / simplicity / puristic ... preference! :) If it's just for the hacking value, I can totally respect that, but it's not an argument you can use to justify architectural modifications to a Unix tool other people are using, because it kinda goes against the Unix philosophy: one job, one tool. Busybox gets away with it ... Better to call Busybox a tool set, it is several commands and a library linked together to share some code size. Beside the invoking logic of the applets (multicall) the applets has to be considered separate commands ... though some commands tend to forget that. I won't try to hide fixed system functionality in a binary (better say program or command here), except for fall back operation (e.g. last resort handling). My usual approach is to spawn a script when it's time to handle system depended things (or need to be under admin control). ... but I like to describe in configuration what to do, not how to do (as it's done in scripts). So I like to have simple lists, describing my system, let a one-shot command parse that list, and call the required programs / commands / scripts with the configured information from the lists, to do the job. e.g. # required virtual file systems /proc root:root 0755 %proc /sys root:root 0755 %sysfs /dev root:root 0755 %tmpfs size=64k /dev/pts root:root 0755 %devpts (describes my system setup - selected part - without describing how to go there) ... and yes that could be done with shell scripting ... the way I'm doing it since years ... but still things tend to be scattered around, so i liked to add setting up the virtual file systems (excluding /tmp, which I setup in fstab) and preparing the device file system (including the device descriptions) in one central place (that was mdev.conf). Currently I put those lines in comments and filter them out into a shell script, but this is sometimes confusing. (and I believe that inclusion of supervisors and super-servers is already too much). ACK ... but what do you think about e.g. tcpsvd (accepting incoming tcp connections), or netlink reader? My fifo watcher does comparable job as tcpsvd (brings me to the idea to create it as fifod applet in BB with appropriate options). We could use TCP connections, but system cost for fifo's named pipe should be below the cost of running through network stack. Even on a noMMU system, I think what you'd gain from having one single userspace binary (instead of multiple small binaries, as I do on my systems) is negligible when you're running a Linux kernel in the first place, which needs at least several megabytes even when optimized to the fullest. Linux kernel including cpio to setup initramfs is around 6 to 8 MByte on modern kernel versions (complete system kernel + system tools + application scripts) ... running on a system with 64 MByte or even 16 to 20 MB. No disks at all. Boot from CD-ROM drive, then turn it of. ... but nowadays more likely boot from USB stick :) ... 256 MByte stick :) ... vintage ... 32 MB Boot partition, boot loader files + boot images + System Config, rest of stick is data storage. I know a guy who manages to run almost-POSIX systems in crazy tiny amounts of RAM - think 256kB, and the TCP/IP stack takes about 40 kB - but that's a whole other world, with microcontrollers, JTAG, a specific OS in asm, and ninja optimization techniques at the expense of maintainability and practicalness. Full duplex serial bridging between an PLC bus system and special synchronous clocked bus system for hazardous areas with a bit rate of 31 kbps and manual Manchester code detection and bit shifting, with a 8 bit CPU of 8 to 12 MHz, 16 kB EPROM and 1 kByte RAM and a frame size of up to 300 Byte in each direction ... and at least one bus side required sending packet check sum in the header :( ... microcontroller ... and OS? what's that? which OS? first instruction executed by CPU after reset at address X ... that was my
[OT] long-lived spawners (was: RFD: Rework/extending functionality of mdev)
On 11/03/2015 14:02, Denys Vlasenko wrote: But that nldev process will exist for all time, right? That's not elegant. Ideally, this respawning logic should be in the kernel. Well there is already a kernel-based solution: hotplug. Sure, it's not serialized, but it's there. If you want something serialized, then you have a stream of information you need to get to the userspace - and at this point, you might as well send it as is and let userspace sort it out, and that's exactly what the netlink does. Needing daemons to answer notifications from userspace processes or the kernel is the Unix way. It's not Hurd's, it's not Plan 9's (AFAIK), but it's what we have, and it's not even that ugly. The listening and spawning logic will have to be somewhere anyway, so why not userspace ? Userspace memory is cheaper (because it can be swapped), userspace processes are safer, and processes are not a scarce resource. -- Laurent ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
I suppose it's time to dig out code from the secret archives of my secret lair again. Someone named Vladimir Dronnikov called this ndev and proposed it as a patch to Busybox in 2009. I dug it out and separated it from Busybox, and probably made some other changes I don't remember, for some nefarious agenda I again don't remember. It is a modified version of mdev that seems to do some of the things you all have been talking about. I'm not maintaining it in any way, so you all are welcome to do whatever you want with it. I have no idea whether it works or not, other than that it apparently was once useful to me. William Haddon On 03/11/2015 11:21:42 AM, Laurent Bercot wrote: On 11/03/2015 15:56, Harald Becker wrote: And one point to state clearly: I do not want to go the way to fork a project (that is the worst expected), but I'm at a point, I like / need to have Busybox to allow for some additional or modified solutions to fit my preferences I don't understand that binary choice... You can work on your own project without forking Busybox. You can use both Busybox and your own project on your systems. Busybox is a set of tools, why should it be THE set of tools ? I'm not sure how heavily mdev [-s] relies on libbb and how hard it would be to extract the source and make it into a standalone, easily hackable project, but if you want to test architecture ideas, that's the way to go - copy stuff and tinker with it until you have something satisfying. Then, if upstream wants to integrate your modifications, if you find a reasonable compromise to merge, great; if not, you still have your harald-mdev, and you can still use it along with Busybox - you'll have two binaries instead of one, but even on whatever tiny noMMU you can work on, you'll be fine. That does not preclude design discussions, which I like, and which can happen here (unless moderators object), and people like Isaac, me, or obviously Denys, wouldn't be so defensive - because it's now about your project, not Busybox; you have the final say, and it's totally fine, because I don't have to use harald-mdev if I don't want to. Forks are bad, but alternatives are good. -- Laurent ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox /* * ndev - hardware detection daemon - based on Busybox's * mdev * * Copyright 2005 Rob Landley r...@landley.net * Copyright 2005 Frank Sorenson fr...@tuxrocks.com * Copyright 2009 Vladimir Dronnikov dronni...@gmail.com * Copyright 2013, 2014 William Haddon will...@haddonthethird.net * * Licensed under GPL version 2; see the file COPYING for details. */ /* Set your tabstop to 4 */ #define _BSD_SOURCE #include dirent.h #include errno.h #include fcntl.h #include grp.h #include libgen.h #include limits.h #include poll.h #include pwd.h #include regex.h #include signal.h #include stdarg.h #include stdio.h #include stdlib.h #include string.h #include syslog.h #include unistd.h #include sys/socket.h #include sys/stat.h #include sys/types.h #include linux/netlink.h #ifndef NDEV_CONF #define NDEV_CONF /etc/ndev.conf #endif #define SCRATCH_SIZE 80 int scan = 0; void emsg(char *fmt, ...) { va_list val; va_start(val, fmt); vfprintf(stderr, fmt, val); va_end(val); va_start(val, fmt); vsyslog(LOG_ERR, fmt, val); va_end(val); } void pemsg(char *fmt, ...) { va_list val; char *err, *nfmt; size_t l1, l2; l1 = strlen(fmt); err = strerror(errno); l2 = strlen(err); nfmt = malloc(l1 + l2 + 4); if (!nfmt) nfmt = fmt; else snprintf(nfmt, l1+l2+4, %s: %s\n, fmt, err); va_start(val, fmt); vfprintf(stderr, nfmt, val); va_end(val); va_start(val, fmt); vsyslog(LOG_ERR, nfmt, val); va_end(val); if (nfmt != fmt) free(nfmt); } char *_strchrnul(char *s, int c) { char *r; r = strchr(s, c); if (!r) return s + strlen(s); return r; } /* Writes an entire buffer to a file descriptor. Returns count on success and -1 on error. */ ssize_t full_write(int fd, char *buf, size_t count) { ssize_t len; ssize_t result; len = count; while (len) { result = write(fd, buf, len); if (result 0 errno != EINTR) return result; if (result 0) len -= result; } return count; } /* Reads a given number of bytes from a file descriptor. Returns the number of bytes read on success and -1 on error. Less than the full number of bytes are read only if the file ends */ ssize_t full_read(int fd, char *buf, size_t count) { ssize_t len; ssize_t result; len = count; while (len) { result = read(fd, buf, len); if (result 0 errno != EINTR) return result; if (result == 0) break; if (result 0) len -= result; } return count - len; } ssize_t copy_until_eof(int infd, int outfd) { char buffer[1024]; ssize_t result; ssize_t len; len = 0; while (1) { result = read(infd, buffer, 1024); if (result 0 errno != EINTR) return result; if
Re: RFD: Rework/extending functionality of mdev
Hi Denys ! mdev rules are complicated already. Adding more cases needs adding more code, and requires people to learn new mini-language for mounting filesystems via mdev. Think about it. You are succumbing to featuritis. ... This is how all bloated all-in-one disasters start. Fight that urge. That's not the Unix way. Yes! ... my failure to mix those things without giving an explanation. Laurent pointed me to the better approach and gave the explanation, the idea is to have a one-shot point of device system initialization ... that means those device system related operations, and may be depp system related virtual file systems (proc, sys, e.g.) ... I even exclude things like setting up a tmpfs for /tmp from this, also it could be done with no extra cost. My previous message to Natanael and Isaac shall clarify my approach and should have been the starting point of this discussion ... which is as told at the early phase of brain storming about doing a rework of the Busybox related device management system, eliminating the long standing problems, giving some more flexibility, and may be adding some more functionality of benefit for those who like them (not blocking others). Though, RFD means request for discussion, not for hacking code in any way, before we reached a point of agreement ... and at least some problems of mdev has already been grumbled about, as I remember ... so a discussion, giving the chance to do the work to overcome those issues ... Adding (some) extra functionality is to enhance flexibility (where I focused on the parts, I'm most concerned with - peoples preferences are different), but only part of the work I like to do. I tried to stay as close as possible at the current mdev.conf syntax, with the intention not to break existing setups ... but would not neglect, creating a different syntax parser, if this would be the outcome of the discussion. I'm open to the results, but like to get a solution for my preferences, too. Can't be that a few people dictate how the rest of the worlds systems has to handle things ... the other side of what you called featuritis (I won't neglect your statement above)! And one point to state clearly: I do not want to go the way to fork a project (that is the worst expected), but I'm at a point, I like / need to have Busybox to allow for some additional or modified solutions to fit my preferences, as others also have also stated already. I'm currently willing and able to do (most of) that work, but if it's not welcome, the outcome of the discussion may also be stepping to a MyBusybox (or however it will be called). *Again*: I don't want to start a discussion about forking the project, it would be the worst possible outcome of my intention ... I like to get a tool set based on BB's principles, but giving more flexibility to fit more peoples preferences, without breaking things for others (at least the majority)! ... this means critical discussion of every belonging topic, but not blocking every new approach and functionality with the argument of size constraints or featuritis, due to personal dislikes, and then accepting a patch which add several hundred of bytes for functionality I expect to be pure nonsense or featuritis. I apologize for my hard words. I don't want to hurt you or anybody else, but you did several decision in the past, which resulted in immense frustration to me (and others). With the consequence of even halting development of several BB focused projects ... please consider more opening to the discussion, based on topics not on pure criticism or personal liking (don't want to initiate lengthy philosophical quarrels with no practical outcome). -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
Hi William ! On 11.03.2015 17:03, William Haddon wrote: I suppose it's time to dig out code from the secret archives of my secret lair again. Someone named Vladimir Dronnikov called this ndev and proposed it as a patch to Busybox in 2009. I dug it out and separated it from Busybox, and probably made some other changes I don't remember, for some nefarious agenda I again don't remember. It is a modified version of mdev that seems to do some of the things you all have been talking about. I'm not maintaining it in any way, so you all are welcome to do whatever you want with it. I have no idea whether it works or not, other than that it apparently was once useful to me. May be worth to dig into, or not? ... I'm currently in the phase of collecting and discussing functionality, not concerned to code hacking. Would get much more interesting, if you can give information, in which functionality it differs. What was the authors intention of forking mdev? And may be, why has it be neglected? Otherwise: I saved your message and may come back to this when I start looking at concrete code, or I'm searching for a specific functionality and how it is handled by other developers. So, thanks for information. -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 3/11/2015 9:30 AM, Harald Becker wrote: So how can we avoid that unwanted parallelism, but still enable all of the above usage scenarios *and* still have a maximum of code sharing *and* a minimum of memory usage *without* delaying the average event handling too much? The gathering parts need to acquire the device information, sanitize this information, and serialize the event operations to the right order. The device node handling part shall receive the information from the gathering part(s) (whichever is used) and call the required operations, but shall avoid reparsing the conf on every event (speed up) *and* drop as much memory usage as possible, when the event system is idle. My idea is a fifo approach. This allows splitting the device management functionalities. Nevertheless which approach to gather the device information is used, the parser and device handling part can be shared (even on a mixed usage scenario). Supposing that we have * mdev acting as a parallel hotplug handler forked by the kernel and then add * mdevd which reads netlink messages and runs as a daemon What specifically is the appeal of a third approach which tries to re-create the kernel netlink design in user-land using a fifo written from forked hotplug helpers? I'm interested in this thread, but there is too much to read. Can you explain your reason in one concise paragraph? -Mike ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
Hi Isaac ! Agreed, whole-heartedly. I just don't think you're quite thinking through exactly what this means. In which sense? Let me know, what I got wrong. I'm a human, making mistakes, not a perfect machine. Agreed. But I would include hotplug daemons under etc. I used etc. so add any similar type of program you like, including hotplug daemons ... but stop, what are hotplug daemons? Do you mean daemons like udev? udev uses netlink reading. Otherwise I know about hotplug helper programs (current mdev operation), but not about hotplug daemons. The gathering parts need to acquire the device information, sanitize this information, and serialize the event operations to the right order. The device node handling part shall receive the information from the gathering part(s) (whichever is used) and call the required operations, but shall avoid reparsing the conf on every event (speed up) *and* drop as much memory usage as possible, when the event system is idle. That *shall* is where you're forgetting about the first principle you mentioned (assuming that you mean shall in the normative sense used in RFCs). ??? Sorry, may be it's I'm not a native English speaker. Can you explain me what is wrong? How should it be? Yes, some people find the hotplugger too slow. But that doesn't mean that everyone wants a new architecture. What do you mean with new architecture? Different system setup? Changing configuration files? My first approach was not to change usage of current systems. Except a slightly bigger BB, you should not have noted from my modifications. Then came Laurent with some questions and suggestions to split of some things for clarification, so the changes may result in slightly modified applet names and/or parameter usage (still under discussion), to be able to adopted all functionality ... but otherwise you won't need to change your setup, if you do not like. Some people, at some points in time, would prefer to use a plain, simple, hotplugger, regardless of whether it's slow. ??? Didn't you notice the following: - using the hotplug handler approach of the kernel (current operation of mdev) (Personally, I'd like a faster boot, but after that a hotplugger that doesn't daemonize is fine by me.) Do you really like forking a separate conf parser for each hotplug event, even if they tend to arrive in bursts? Won't you like to get a faster system startup with mdev, without changing system setup / configuration? So, in order to respect that preference, it would be nice if you could let the hotplug part of mdev keep doing what it does. What expect you to be the hotplug part? The full hotplug handler including conf file parser, spawned in parallel for each event? Won't you like to benefit from a faster system startup, only why you fear there is another (minimalistic) daemon sitting in the back? Sounds like automobiles are dangerous, I won't use them? ... sorry if this sounds bad, I try to understand what exactly you are fearing ... I expect you did misunderstand something (or I explained / translated it wrong). My idea is a fifo approach. This allows splitting the device management functionalities. Nevertheless which approach to gather the device information is used, the parser and device handling part can be shared (even on a mixed usage scenario). I understand that the goal here is to allow people to use netlink or hotplug interchangeably with mdev -p (which I still think is a poorly named but very desireable feature). Please don't stay at those specific parameter names, think of specific functionalities. The names are details under discussion ... but here especially I expect you misunderstand something. mdev -p would be for internal purpose to distinguish invocation from the usual mdev, and mdev -s usage (current mdev). So if you won't change your system setup to benefit from extra functionality, you won't ever need mdev -p, it is for internal usage and special purposes (the p stands for parser, a quick and dirty selection, to use something). As stated before, I don't think that this approach is really functional, and would be more opposed to using it than to using netlink or a plain hotplugger. For this reason, I'm opposed to including it in *mdev*. ??? Not functional? In which way? What does you fear? I also think that those who *do* want to use this approach would benefit more from a non-busybox binary, since the hotpluggger needs to be as small and minimal as possible. Hence, I suggest doing it outside busybox. Yes, that hotplug helper may benefit from being as small and fast as possible, but a separate program means a separate binary, and that conflict with my one single statical binary preference, which I share with others. I consider splitting of that helper in a separate binary is under discussion, but otherwise won't change anything on the concept (it is not much more than a compile / link question).
Re: RFD: Rework/extending functionality of mdev
On 11.03.2015 22:44, Michael Conrad wrote: What specifically is the appeal of a third approach which tries to re-create the kernel netlink design in user-land using a fifo written from forked hotplug helpers? You mix things a bit. My approach allows to either using netlink or kernel hotplug method, sharing the code and invoking only one instance of the parser / handler even on hotplug event bursts. The third method is the initial device file system population Splitting this into different processes is the Unix way of multi threading, using fifo (= named pipe) for inter process communication. ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
Hi Michael ! On 11.03.2015 22:44, Michael Conrad wrote: I'm interested in this thread, but there is too much to read. Can you explain your reason in one concise paragraph? One paragraph is a bit to short, my English sucks, but I try to summarize the intention of my approach in compact steps (a bit more than one screen page of text): - current kernel hotplug based mdev suffers from parallel start and reading / scanning the conf on each instance - the known and proven solution to this is using a netlink reader daemon (long lived daemon which stays active all the time) - there are still people who insist staying on the kernel hotplug feature (for whatever reason, so accept we need hotplug + netlink) - The third method (scanning sys file system) is the operation of mdev -s (initial population of device file system, don't mix with mdev w/o parameters), in addition some user take this as semi automatic device management (embedded world) - So how can we have all three methods without duplication of code, plus how can we speed up the kernel hotplug method? - answer is to split the gathering parts from the conf file parser and device operation, plus let the parser / handler accept many events with one invocation (event bursts), plus saving memory when event system is idle (parser process exits when idle for some duration) - kernel hotplug helper could fire up the fifo and a parser / handler when there is one required, but this check adds extra delay / cost on first / all delivered events - solution is a minimalistic fifo watcher and parser startup daemon (proven Unix concept for on demand N to 1 inter process communication), the fifo watcher creates and hold the fifo open, but never touches data in the fifo, only startup a parser when required, allows failure management when parser sucks - now the system maintainer can decide which method to use, unwanted methods may be opted out on BB config, plus easier embedding BB based device management from external programs (include parser, drop methods) - beside the netlink code, after rework I expect a near 1 : 1 average of binary size compared to current code, but less memory usage on event bursts (only one parser process), plus speed improvement on event bursts (faster system start up when using hotplug) - other intended functional improvements are for personal preference to get the ability of a one-shot device file system startup (single command to setup all device stuff, still under full control of admin, no hard coded functionality in any binary) And last: Don't stick on the mdev -... names mentioned, look at the intended functionalities, implementation details (names to use) are still under discussion. Hope that was short enough. -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On Tue, 10 Mar 2015 17:26:20 +0100 Harald Becker ra...@gmx.de wrote: First look at, what to do, then decide how to implement: mdev needs to do the following steps: - on startup the sys file system is scanned for device entries You don't want scan sysfs for devince entries. Instead you want trigger hotplug events. Something like: find /sys -type f -name uevent | while read entry; do echo add $entry done - as a hotplug handler for each event a process is forked passing information in environment variables You don't want fork a process for every hotplug event. Instead you want catch the hotplug events with netlink. - when using netlink a long lived daemon read event messages from a network socket to assemble same information as the hotplug handler You don't want collect the same info twice. Instead you want a netlink listener (long lived daemon) that is minimal. And then you want a hotplug handler that deals with the events. That is what mdev does. It even handles MODALIAS events which is not a device entry - when all information for an event has been gathered, mdev needs to search it's configuration table for the required entry, and then ... - ... do the required operations for the device entry We can simplify what needs to happen at boot with: - prepare /sys /proc /dev - set up hotplug handler - trigger hotplug events You can currently do that very simple with existing mdev: # prepare /sys /proc /dev mount ... /sys ... # set up hotplug handler echo /sbin/mdev /proc/sys/kernel/hotplug # trigger hotplug events find /sys -type f -name uevent | while read entry; do echo add $entry done This works but will trigger tons of forks and cannot guarantee that the events are handled in correct order - serialization problem. I think mdev has a hack (/dev/mdev.seq) to resolve the serialization problem. The performance is still bad due to all forks. To avoid the many forks you need use a netlink listener. This will also solve the serialization problem properly. You can solve this by adding netlink support to mdev and turn it into a daemon. Then the netlink listener and the event handler is in same executable, similar to what udev does. This has a few drawbacks: - the handler code will always be running. eg the mdev.conf parsing code will be in the daemon all the time. It can be discussed if this is a real problem or not because kernel memory manager will handle memory usage and reuse inactive memory properly. - mdev is currently designed to exit. Turning it to a daemon will likely require some work for error handling. I think it is a better idea to leave mdev as a short lived process which exists when its done. We can do that by separating out the netlink listener to a separate daemon which forks mdev and send a bulk of events via pipe - as explained in my previous email. ... Both the sys file system scanner and a netlink daemon could easily establish a pipe and then send device commands to the parser. The sys file system scanner (startup = mdev -s) can create the pipe, then scan the sysfs and send the commands to the parser. When done, the pipe can be closed and after waiting for the parser process just exit. You don't want sysfs scanner to send device commands anywhere. You simply trigger the hotplug events with echo add /sys/./uevent and then will netlink listener deal with it as if it was hotplugged. The netlink daemon, can establish the netlink socket, then read events an sanitize the messages. When there is any message for the parser, a pipe is created and messages can be passed to the parser. When netlink is idle for some amount of time, it can close the pipe and check the child status. The netlink listener daemon will need to deal with the event handler (or parser as you call it) dying. I mean, the handler (the parser) could get some error, out of memory, someone broke mdev.conf or anything that causes mdev to exit. If the child (the handler/parser) dies a new pipe needs to be created and the handler/parser needs to be re-forked. With that in mind, wouldn't it be better to have the timer code in the handler/parser? When there comes no new messages from pipe within a given time, the handler/parser just exists. Confusion arise only on the hotplug handler part, as here a new process is started for every event by the kernel. Forking a pipe to send this to the parser would double the overhead. But leaving the parser running for some amount of time, would only work with a named fifo, startup of the parser when required and adds timeout management to the parser ... named pipes will just make things more complicated. -nc ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On Tue, 10 Mar 2015 17:56:59 + Isaac Dunham ibid...@gmail.com wrote: Now, a comment or two on design of mdev -i and the netlink listener: I really *would* like there to be a timeout of some sort; in my experience, 1 second may be rather short, and 5 seconds is usually ample. Just one more comment on the default time-out value. The idea was to reduce the number of forks not completely remove them. We want make each burst (many evenst within short time) go via pipe, while we allow the less frequent use fork. If there are a long delay due to slow devices like USB1 etc, then the handler will exit for a while and once the delayed event comes back, the handler will just auto respawn. We will never have more than one fork per second though. We don't even need to let it be configurable unless we think that one fork per second is too many forks. -nc ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On Tue, 10 Mar 2015 17:56:59 + Isaac Dunham ibid...@gmail.com wrote: On Tue, Mar 10, 2015 at 01:37:41PM +0100, Harald Becker wrote: Hi Laurent ! ... I dislike the idea of integrating early init functions into mdev, because those functions are geographically (as in the place where they're performed) close to mdev, but they have nothing to do with mdev logically. Sorry, I don't agree. mdev's purpose is to setup the device file system infrastructure, or at least to help with setting up this. Ok, at first leave out proc and sysfs, there you may be right, but what about devpts on /dev/pts, or /dev/mqueue, etc., and finally the tmpfs for /dev itself. Do you really say they do not belong to the device file system infrastructure? Or all those symlinks like /dev/fd, /dev/stdin, /dev/stdout, etc., do you consider them not to belong to mdev? Not talking about setting owner informations and permissions for those entries in the device file system. In my humble opinion, these do not belong to mdev the busybox applet. +1 ... Factors that make me want this: * mdev can noticeably slow down boot (I've set up mdev as hotplug helper with Debian, and notice that the resulting initrd takes a second or two more than udev to load.) I would assume that loading all the modules in Debian results in a fork bomb. This is why I want a separate pipe for modaliases. I want disable MODALIAS handling from mdev and add modprobe -i support with a timeout feature. * hotplug is a rare event, but with modern hardware it frequently causes a sudden burst of activity. A flash drive frequently triggers 5+ events, and plugging in a phone/ turning on mass storage may cause a dozen or more. Even plugging in an SD card may cause 3 or more events. Exactly. The events comes in bursts. That is why i think it makes sense to auto spawn the handler, forward the event via a pipe and exit the handler on periods with no activity. The long lived instance probably needs to stay out of busybox to make this really minimize memory use, though. +1 Certainly it shouldn't be in the same binary as mdev. (Some people use multiple busybox binaries, which is probably more optimal.) Agree. ... Now, a comment or two on design of mdev -i and the netlink listener: I really *would* like there to be a timeout of some sort; in my experience, 1 second may be rather short, and 5 seconds is usually ample. I also think that the timeout should be adjustable. Now, the timeout could be done in at least two ways: (a) timeout in mdev -i: disable timeout on read (or set timeout for executing rules) parse input and execute rules at the end of execution, reset timeout for read (b) timeout in netlink daemon: listen for events on event: disable timeout check state of mdev/pipe respawn if needed write event to pipe check write result; if needed, retry reset timeout on timeout: close pipe and let mdev die when it finishes Everything but the timeout has to be done anyhow in both mdev and the netlink daemon. In (a), the timeout could be set on the mdev commandline, or in mdev.conf; if it is set on the mdev commandline, *that* needs to be specified on the netlink daemon's commandline or in its config file. if you set it on the command line then you need respawn the netlink listener if you want change it. (or having the netlink listener re-read a config file on every event. i dont think we want that) If we use mdev.conf you can just change that file and kill mdev and new setting is active. Not that this is a big issue. Just interesting consequence which i had not thought of. In (b), the timeout is specified in the netlink daemon's commandline or in its config file. Somehow, I find myself prefering option (b); managing timeouts seems to be the job of a daemon rather than a hotplugger, and it fits logically with everything else that the netlink daemon is doing. I tend to prefer option (a) because the netlink daemon needs to deal with a killed mdev anyway. If we let mdev handle the timeout then the netlink daemon only need to check for POLLHUP on the pipefd. This also has the befit that the long lived daemon becomes smaller. -nc ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
Hi Laurent ! 1) Starting up a system with mdev usually involves the same steps to mount proc, sys and a tmpfs on dev, then adding some subdirectories to /dev, setting owner, group and permissions, symlinking some entries and possibly mounting more virtual file systems. This work need to be done by the controlling startup script, so we reinvent the wheel in every system setup. The thing is, this work doesn't always need to be done, and when it does, it does whether you use udevd, mdev, or anything else. /dev preparation is one of the possible building blocks of an init routine, and it's quite separate from mounting /proc and /sys, and it's also separate from populating /dev with mdev -s. You missed the fact, I told everything stays under control of the admin or system setup. Nobody shall be forced to do things in a specific way. I want just give some extended functionality and flexibility to do this setup in an very easy and IMO straight looking way. If you like/need to do those mounts in a specific way, just put them in your script and leave out the mount lines from mdev.conf, otherwise if you need to setup those virtual file systems with specific options you can specify them in the same line of configuration. ... I dislike the idea of integrating early init functions into mdev, because those functions are geographically (as in the place where they're performed) close to mdev, but they have nothing to do with mdev logically. Sorry, I don't agree. mdev's purpose is to setup the device file system infrastructure, or at least to help with setting up this. Ok, at first leave out proc and sysfs, there you may be right, but what about devpts on /dev/pts, or /dev/mqueue, etc., and finally the tmpfs for /dev itself. Do you really say they do not belong to the device file system infrastructure? Or all those symlinks like /dev/fd, /dev/stdin, /dev/stdout, etc., do you consider them not to belong to mdev? Not talking about setting owner informations and permissions for those entries in the device file system. Sure, all that can be done with some shell scripts, scattering around all that information at different places, or the need to setup and read/process different configuration files. My intention is to get this information which depend on system setup in a single, relatively simple to use or modifiable location. The startup script itself just invokes mdev, which gets the information from /etc/mdev.conf, which calls the necessary commands to do the operation. That is, it frees the distro manager to write script code to parse the system specific information into the startup script. I consider Busybox to be a set of tools, anybody may use to setup a system to there own wishes, not to force anybody to do things in the way any person may feel being it's the best way to do things, but I want to enable functionality for those who like to collect this information and put them in a central place. Information usually scattered around and hidden deep in the scripts controlling the startup. MOUNTPOINT UID:GID PERMISSIONS %FSTYPE [=DEVICE] [OPTIONS] I can't help but think this somehow duplicates the work of mount with /etc/fstab. If you really want to integrate the functionality in some binary instead of scripting it, I think the right way would be to patch the mount applet so it accepts any file (instead of hardcoding /etc/fstab), so you could have a early mount points file, distinct from your /etc/fstab, that you would call mount on before running mdev -s. Laurent you didn't look at the examples. I do not want to hard code any *functionality* in mdev. I want add extra functionality to the current mdev.conf syntax to allow to do some more stuff, which is usually done geographically close to calling mdev, and done in so many systems in similar matter. Look at the usual usage of /etc/fstab, on how many systems do you find there information about your virtual file systems? Usually fstab is used for the disk devices. In addition what about creating the mount points, setting owner, group and permissions? This is not done by mount and not specified in fstab. So changing anything there would mean to modify fstab syntax, possibility breaking other programs and scripts, reading and modifying fstab. Neither do I want to code any special functionality in a binary, nor do I try to duplicate the operation of mount. I just want to extend the mdev.conf syntax to add simple configuration information for those close to mdev operations at a central place, parse this information and call the usual commands, e.g. mount with the right options, as shell scripts do. And what else are a few lines, in mdev.conf, describing those mounts, other then placing them in a separate early mount points file? ... ok, lets go one step further on this. Let us add an include option to mdev.conf, which allow to split the mdev configuration in different files, and/or place
Re: RFD: Rework/extending functionality of mdev
Hi Harald, I'm sorry if I came across as dismissive or strongly opposed to the idea. It was not my intent. My intent, as always, is to try and make sure that potential new code 1. is worth writing at all, and 2. is designed the right way. I don't object to discourage you, I object to generate discussion. Which has been working so far. ;) You missed the fact, I told everything stays under control of the admin or system setup. Nobody shall be forced to do things in a specific way. I want just give some extended functionality and flexibility to do this setup in an very easy and IMO straight looking way. I understand your point. If I don't modify my mdev.conf, everything will still work the same and I can script the functionality if I prefer; but if you prefer centralizing the mounts and other stuff, it will also be possible to do it in mdev.conf. It is very reasonable. The only questions are: - what are the costs of adding your functionality to mdev.conf ? - are the benefits worth the costs ? You may have noticed that I'm very conservative as far as code is concerned ;) You're saying I like centralized information, the benefits are obvious to me and I'm answering I prefer scripting, so I don't see the benefits to your change. It's a debate of taste, and we'll never get anywhere like this. We need to dig a little more. As a preamble, let me just say that if you manage to make your syntax extensions a compile-time option and I can deactivate them to get the same mdev binary as I have today, then I have no objection, apart from the fact that I don't think it's good design - see below. Sorry, I don't agree. mdev's purpose is to setup the device file system infrastructure, or at least to help with setting up this. Ok, at first leave out proc and sysfs, there you may be right, but what about devpts on /dev/pts, or /dev/mqueue, etc., and finally the tmpfs for /dev itself. Do you really say they do not belong to the device file system infrastructure? Or all those symlinks like /dev/fd, /dev/stdin, /dev/stdout, etc., do you consider them not to belong to mdev? Not talking about setting owner informations and permissions for those entries in the device file system. OK, now this is interesting. I firmly believe that most of our disagreement comes from the fact that mdev and mdev -s are the same binary, but *not* the same functionality at all. To me, mdev.conf is a configuration file for mdev functionality, i.e. configuration for a hotplug helper. Since mdev can be invoked on every hotplug event, it's very important that mdev.conf parsing remain fast. You are saying that you would benefit from user interface improvements to the /dev *initialization* functionality, i.e. mdev -s. I understand, and I agree that a one-stop-shop for /dev initialization would be nice. My point, which I didn't make clear in my previous message because I was too busy poking fun at you (my apologies, banter should never override clarity), is that I find it bad design to bloat the parser for a hotplug helper configuration file in order to add functionality for a /dev initialization program, that has *nothing to do* with hotplug helper configuration. The confusion between the two comes from two places: - the applet name, of course; I find it unfortunate that mdev and mdev -s share the same name, and I'd rather have them called mdev and mdev-init, for instance. - the fact that mdev's activity as a hotplug helper is about creating nodes, fixing permissions, and generally doing stuff around /dev, which is, as you say, logically close to what you need to do at initialization time, so at first sight the configuration file looks like a tempting place to put that initialization time work. But I maintain that mdev.conf should be for mdev, not for mdev -s. mdev -s is just doing the equivalent of calling mdev for every device that has been created since boot. If you make a special section in mdev.conf for mdev -s, this 1. blurs the lines even more, which is not desirable, and 2. bloats the parser for mdev with things it does not need after the first mdev -s invocation. Sure, all that can be done with some shell scripts, scattering around all that information at different places, or the need to setup and read/process different configuration files. My intention is to get this information which depend on system setup in a single, relatively simple to use or modifiable location. I agree with that goal. I just don't think mdev.conf is the place to do it. The startup script itself just invokes mdev, which gets the information from /etc/mdev.conf No: the startup script invokes mdev -s, which invokes (or does the equivalent of invoking) mdev, which gets the information from mdev.conf. But the functionality you're planning to add is specific to mdev -s: the actions would be taken by mdev -s but outside of its series of mdev invocations. And what else are a few lines, in mdev.conf, describing those mounts, other
Re: RFD: Rework/extending functionality of mdev
On Tue, Mar 10, 2015 at 01:37:41PM +0100, Harald Becker wrote: Hi Laurent ! ... I dislike the idea of integrating early init functions into mdev, because those functions are geographically (as in the place where they're performed) close to mdev, but they have nothing to do with mdev logically. Sorry, I don't agree. mdev's purpose is to setup the device file system infrastructure, or at least to help with setting up this. Ok, at first leave out proc and sysfs, there you may be right, but what about devpts on /dev/pts, or /dev/mqueue, etc., and finally the tmpfs for /dev itself. Do you really say they do not belong to the device file system infrastructure? Or all those symlinks like /dev/fd, /dev/stdin, /dev/stdout, etc., do you consider them not to belong to mdev? Not talking about setting owner informations and permissions for those entries in the device file system. In my humble opinion, these do not belong to mdev the busybox applet. Do one thing means *method*, not just problem area - just like cut, head, and tail are separate programs, despite all selecting portions of a text input based on mathematical criteria. And mdev takes care of setting up devices by looking at information from the kernel (via uevent or environment; the two are very similar). If you don't make this distinction, how do you argue that lspci should not query the PCI ID database via DNS (-Q), or that it's out of scope for an init system to fold in the device manager? Now, if you added support for using busybox as pre-compiled init scripts, it would be reasonable for an applet to do this. (Yes, I've heard of such things being done with busybox.) ... but here it's different. s6-devd spawns a helper program for every event, like /proc/sys/kernel/hotplug would do, but it reads the events from the netlink so they're serialized. Spawning a process for every event produce massive slowdown on system startup. My intention is to fork only one process, parse the conf table once and then read sanitized events from stdin, scanning the in memory conf table and invoking the right operations. If you use a similar format to uevent (eg, VAR1=VAL1\0VAR2=VAL2\0\0 or something similar) this sounds desireable to me. I suppose this could be mdev -i. However, I'd suggest figuring out how to make environment variable hooks work the same in mdev -s/-i and just plain mdev first. Otherwise, mdev -i would be at a disadvantage. Note: while it's trivial to swap out the environment with each event, this is probably a Bad Idea for a longer-lived process, due to the potential for leaks and other chaos. Factors that make me want this: * mdev can noticeably slow down boot (I've set up mdev as hotplug helper with Debian, and notice that the resulting initrd takes a second or two more than udev to load.) I would assume that loading all the modules in Debian results in a fork bomb. * hotplug is a rare event, but with modern hardware it frequently causes a sudden burst of activity. A flash drive frequently triggers 5+ events, and plugging in a phone/ turning on mass storage may cause a dozen or more. Even plugging in an SD card may cause 3 or more events. In your design, the long-lived process forks a unique helper that reads mdev.conf... so it's not meant to be used with any program compatible with /proc/sys/kernel/hotplug - it's only meant to be used with mdev. So, why fork at all ? Your temporary instance is unnecessary - just have a daemon that parses mdev.conf, then listens to the netlink, and handles the events itself. No, it is not unnecessary. The long lived instance tries to stay at low memory usage, then forks and second instance read mdev.conf into the memory table. When the event system goes idle, as it is most of the time, the second instance dies and memory is freed to the system. When I'm doing something particularly memory intensive (in my case, *usually* linking a large program), I sometimes stop every service and program I can. This has allowed me to build at least one program I could not have built otherwise. There may be times when there's memory pressure that the user is unaware of, but frequently the user controls both memory pressure and hotplug. The long lived instance probably needs to stay out of busybox to make this really minimize memory use, though. Certainly it shouldn't be in the same binary as mdev. (Some people use multiple busybox binaries, which is probably more optimal.) Seriously, udev is a hard problem. udevd is bloated and its configuration is too complex; mdev is close to the point where parsing a config file for every invocation will become too expensive - I believe that it has not reached that point yet, but if you add more stuff to the conf language, it will. Right, adding more to the conf language adds complexity there, but removes complexity of the surrounding scripts and collect system specific information at a central place (or places), usually hidden
Re: RFD: Rework/extending functionality of mdev
Hi, getting hints and ideas from Laurent and Natanael, I found we can get most flexibility, when when try to do some modularization of the steps done by mdev. At fist there are two different kinds of work to deal with: 1) The overall operation and usage of netlink 2) Extending the mdev.conf syntax Both are independent so first look at the overall operation ... and we are currently looking at operation/functionality. This neither means they are all separate programs/applets. We may put several functionalities in one applet and distinguish by some options. First look at, what to do, then decide how to implement: mdev needs to do the following steps: - on startup the sys file system is scanned for device entries - as a hotplug handler for each event a process is forked passing information in environment variables - when using netlink a long lived daemon read event messages from a network socket to assemble same information as the hotplug handler - when all information for an event has been gathered, mdev needs to search it's configuration table for the required entry, and then ... - ... do the required operations for the device entry That is scanning the sys file system, hotplug event handler and netlink event daemon trigger operation of the mdev parser. Forking a conf file parser for each event is much overhead on system startup, when many event arrive in short amount of time. There we would benefit from a single process reading from a pipe, dieing when there are no more events and reestablish when new events arrive. Both the sys file system scanner and a netlink daemon could easily establish a pipe and then send device commands to the parser. The parser reads mdev.conf once and creates a in memory table, then reads commands from the pipe and scans the memory table for the right entry. When EOF on pipe read, the parser can exit successfully. The sys file system scanner (startup = mdev -s) can create the pipe, then scan the sysfs and send the commands to the parser. When done, the pipe can be closed and after waiting for the parser process just exit. The netlink daemon, can establish the netlink socket, then read events an sanitize the messages. When there is any message for the parser, a pipe is created and messages can be passed to the parser. When netlink is idle for some amount of time, it can close the pipe and check the child status. Confusion arise only on the hotplug handler part, as here a new process is started for every event by the kernel. Forking a pipe to send this to the parser would double the overhead. But leaving the parser running for some amount of time, would only work with a named fifo, startup of the parser when required and adds timeout management to the parser ... ... but ok, do we look at an alternative: Consider a small long lived daemon, which create a named fifo and then poll this fifo until data get available. On hotplug event a small helper is started, which read it's information, serializes then write the command to the fifo and exit. The long lived daemon sees the data (but does not read), then forks a parser and gives the read end of the pipe to the the parser. The parser reads mdev.conv once, then processes commands from the fifo. Now we are at the situation where the timeout needs to be checked in the parser. When there are no more events on the fifo the parser just dies successfully (freeing used memory). This will be detected by the small long lived daemon, which check the exit status and can act on failures (e.g. run a failure script). On successful exit of the parser the daemon starts again waiting for data on the fifo (which he still hold open for reading and writing). This way the hotplug helper will benefit from a single run parser on startup, but memory used by the conf parser is freed during normal system operation. The doubling of the timeout management in netlink daemon and parser can be intentional when different timeouts are used. Where a small duration for the idle timeout of netlink can be chosen, the parser itself does use a higher timeout, which only triggers when the hotplug helper method is used. yes, there are some rough corners, but we are at the phase of brain storming. Beside those corners do we get a modular system, which avoid respawning/rereading the conf table for every event, but frees memory when there are no more events. Even the hotplug helper method will benefit, as the helper process can exit, as soon as the command has been written to the fifo. The parser reads serialized commands from the pipe and process required actions. May be we should consider using that small parser helper daemon and the named fifo in all cases. sys file system scanner, hotplug helper and netlink daemon will then just use the fifo. This would even allow to use the same fifo to activate the mdev parser from a user space program (including single parser start for multiple events).
Re: RFD: Rework/extending functionality of mdev
Hi Natanael ! I am interested in a netlink listener too for 2 reasons: - serialize the events - reduce number of forks for perfomance reasons My primary intentions. That is, I want to auto fork a daemon which just open the netlink socket. When events arrive it forks again, creating a pipe. The new instance read mdev.conf, build a table of rules in memory, then read hotplug operations from the pipe (send by the first instance). When there are no more events for more then a few seconds, the first instance closes the pipe and the second instance exits (freeing the used memory). On next hotplug event a new pipe / second instance is created. I have a simlar idea, but slightly different. I'd like to separate the netlink listener and the event handler. Ack. After thinking on Laurents message, I came to this, too. Split off the netlink part and use a pipe for communication. That can even go further, split off the initial sys scanning and hotlink parts from the parser and also use the pipe to communicate. Creating an mdev wrapper around this, so handling stays as is. This does not mean we need separate applets, may be can all include in one mdev applet with operation controlled by options. Later I will write a reply to Laurents message, going into more detail. I am thinking of using http://git.r-36.net/nldev/ which basically does the same thing as s6-devd: minimal daemon that listens on netlink and for each event it fork/exec mdev. Ok, this may be a second alternative. As I do not want to reinvent the netlink part, I will take a deep look on the possible alternatives and try to adapt them for Busybox. - the mdev pipe fd is added to the poll(2) call so we catch POLLHUP to detect mdev timeout. When that happens, set the mdev pipe fd to -1 so we know that it needs to be respawned on next kernel event. Why doing that so complicated? The mdev parser shall just read simple device add/remove commands from stdin until EOF, then exit. That's it. The netlink part can easily watch how long it is idle and then just close the pipe. As soon as more events arrive it creates a new pipe and fork another mdev parser. This needs time management and poll in only one program. All other code is simple and straight forward. The netlink reader, as a long lived daemon, already needs to watch the forked processes and act on failures. ... but those are details in implementation and some optimization. I agree on the ideas/functionality behind this. The benifits: - the netlink listener who needs to be running all times is very minimal. This may be an argument to have the netlink part not linked into BB, but does otherwise not change the idea behind this. - when there are many events within short time, (eg coldplugging), we avoid the many forks and gain performance. - when there are no events, mdev will timeout and exit. ACK - busybox mdev does not need set up netlink socket. (less intrusive changes in busybox) Can be done as an alternative so admin may decide if he likes to use netlink or hotplug helper. That is, nobody is forced to handle things in a special way. BB shall just give the tools to build easy setups. Unwanted parts/applets may be left out on BB configuration (if size matters else default all tools in and let the admin chose). Then I'd like to do something similar with modprobe: - add support to read modalias from stdin and have 1 sec timeout. - have nldev to pipe/fork/exec modprobe --stdin on MODALIAS events. Nice idea. May be with some slight modification for optimization of the timeout handling. That way we can also avoid the many modprobe forks during coldplug. ACK, so the project needs a wider view. Thanks for pointing me to this. ... later more details in my Reply to Laurents message. -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On Sun, 08 Mar 2015 16:10:32 +0100 Harald Becker ra...@gmx.de wrote: Hi, I'm currently in the phase of thinking about extending the functionality of mdev. As here are the experts working with this kind of software, I like to here your ideas before I start hacking the code. ... 2) I like to use netlink to obtain hotplug information and avoid massive respawning of mdev as hotplug helper when several events arrive quickly. I am interested in a netlink listener too for 2 reasons: - serialize the events - reduce number of forks for perfomance reasons That is, I want to auto fork a daemon which just open the netlink socket. When events arrive it forks again, creating a pipe. The new instance read mdev.conf, build a table of rules in memory, then read hotplug operations from the pipe (send by the first instance). When there are no more events for more then a few seconds, the first instance closes the pipe and the second instance exits (freeing the used memory). On next hotplug event a new pipe / second instance is created. I have a simlar idea, but slightly different. I'd like to separate the netlink listener and the event handler. I am thinking of using http://git.r-36.net/nldev/ which basically does the same thing as s6-devd: minimal daemon that listens on netlink and for each event it fork/exec mdev. What I'd like to do is: change mdev to: - be able to read events from stdin. same format as from netlink socket. - set a timeout on stdin (1 sec or so by default). when time out is reached (no event within a sec) then just exit. change nldev to: - have a mdev pipe fd which we forward the kernel events to. - on kernel event if mdev_pipe_fd is -1 then: create pipe, fork and exec mdev with args to have mdev read from stdin (as explained above) else: write the kernel event to the pipe fd - the mdev pipe fd is added to the poll(2) call so we catch POLLHUP to detect mdev timeout. When that happens, set the mdev pipe fd to -1 so we know that it needs to be respawned on next kernel event. The benifits: - the netlink listener who needs to be running all times is very minimal. - when there are many events within short time, (eg coldplugging), we avoid the many forks and gain performance. - when there are no events, mdev will timeout and exit. - busybox mdev does not need set up netlink socket. (less intrusive changes in busybox) Then I'd like to do something similar with modprobe: - add support to read modalias from stdin and have 1 sec timeout. - have nldev to pipe/fork/exec modprobe --stdin on MODALIAS events. That way we can also avoid the many modprobe forks during coldplug. -nc ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 09/03/2015 09:41, Natanael Copa wrote: What I'd like to do is: change mdev to: - be able to read events from stdin. same format as from netlink socket. The thing is, the format from the netlink socket is a bit painful to parse; the hard part of netlink listeners isn't to actually listen to the netlink (which is as small and easy as a socket() call), but to parse the messages. If your goal is to keep the netlink stuff out of mdev, you're probably better off designing a simpler format to pass the events to stdin. If you don't want to do that, you might as well make mdev listen to the netlink itself - you'll spare one extra process. The information that a hotplug helper needs is actually very simple: it's a small dictionary like an envp, and /proc/sys/kernel/hotplug forks the helper with just the right information in its environment. So I'm going to suggest the following format: when the netlink listener gets an event, it sends a sequence of null-terminated VARIABLE=VALUE strings to the helper's stdin, and the end of the set is notified via an extra null character. Example: HOME=/\0TERM=linux\0ACTION=something\0DEVPATH=/foo\0\0 - set a timeout on stdin (1 sec or so by default). when time out is reached (no event within a sec) then just exit. I'm still doubtful of the benefits of that approach. mdev is small, and when it's not doing anything, it's not using many resources. And you have to commit those resources anyway, since a kernel event might appear at any time and you don't want to oom because of that - so, why not simply keep a long-lived helper ? -- Laurent ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
RFD: Rework/extending functionality of mdev
Hi, I'm currently in the phase of thinking about extending the functionality of mdev. As here are the experts working with this kind of software, I like to here your ideas before I start hacking the code. I like to focus on the following topics: 1) Starting up a system with mdev usually involves the same steps to mount proc, sys and a tmpfs on dev, then adding some subdirectories to /dev, setting owner, group and permissions, symlinking some entries and possibly mounting more virtual file systems. This work need to be done by the controlling startup script, so we reinvent the wheel in every system setup. I like to extend the syntax of mdev.conf with some extra information and add code into mdev to allow to do this operation in a more simplified way, but still under full control of the system maintainer. Those extra entries will only be executed whith mdev -s not during hotplug. The syntax has been chosen to not (horribly) break existing mdev.conf setups. Current major syntax of mdev.conf: [-][envmatch]device regex uid:gid permissions [...] [envmatch]@maj[,min1[-min2]] uid:gid permissions [...] $envvar=regex uid:gid permissions [...] - Additional syntax to mount (virtual) file systems: MOUNTPOINT UID:GID PERMISSIONS %FSTYPE [=DEVICE] [OPTIONS] This rule is triggered by the percent sign indicating the file system type. This shall create the mount point (if it not exist), set owner/group/permissions of the mount point, fork and exec mount -t FSTYPE -o OPTIONS DEVICE MOUNTPOINT. If DEVICE is not specified the literal virtual shall be used. e.g. # mount virtual file systems /proc root:root 0755 %proc /sysroot:root 0755 %sysfs /dev root:root 0755 %tmpfs size=64k,mode=0755 /dev/pts root:root 0755 %devpts This will do all the required mounting with a single mdev -s invocation, even on a system which has nothing else mounted. The old behavior to mount the file systems in the calling scripts will still be available, just leave out the mount lines from mdev.conf. - Additional Syntax to add directories and set there owner informations: DIRNAME/ UID:GID PERMISSIONS [ LINKNAME] This rule is triggered by the slash as the last character of the match string. It shall create the given directory, where relative names are from the expected /dev base. e.g. # add required subdirectories to the device file system loop/ root:root 0755 input/ root:root 0755 Those directories may be created automatically due to other rules, but then you can't control there owner informations. The extra rule allows to create the subdirectories on startup and set the owner information as you like. Later matching device rules will not change this, so you can tune the directory and device permissions. - Additional syntax to add symlinks and set there owner information: PATHNAME@ UID:GIDLINKNAME This rule is triggerd by the at sign as last character of the match string. Shall add the given PATHNAME as a symlink to LINKNAME, and set the owner of the link. e.g. # add symbolic links to the device filesystem fd@ root:root /proc/fd stdin@ root:root fd/0 stdout@ root:root fd/1 stderr@ root:root fd/2 - Extending syntax for symlink handling on device nodes: The current syntax allows to either move the device to a different name/location or to move and add a symlink. In some situations you need just a symlink pointing to the new device. DEVICE_REGEX UID:GID PERMISSIONS =NEW_NAME (old) Moves the new device node to the given location. DEVICE_REGEX UID:GID PERMISSIONS PATHNAME (old) Will create the new device node with the name PATHNAME and create a symlink with DEVICE_NAME pointing to PATHNAME. Shall remove existing symlink and create a new one. DEVICE_REGEX UID:GID PERMISSIONS PATHNAME (new) Shall create the new device under it's expected device name, and in addition create a symlink of name PATHNAME pointing to the new device. Existing symlinks shall not be touched. e.g. creating a /dev/cdrom symlink for the first cdrom drive sr[0-9]+ root:cdrom 0775 cdrom Shall create /dev/sr0 and a symlink /dev/cdrom - /dev/sr0, but will not overwrite the symlink for /dev/sr1, etc. This may be combined with the move option: DEVICE_REGEX UID:GID PERMISSIONS =NEW_NAME PATHNAME Shall move the device to NEW_NAME as expected and then create the symlink to that location. e.g. moving sr0 into subdirectory and adding symlink sr[0-9]+ root:cdrom 0775 =block/ cdrom Shall create /dev/block/sr0 and a symlink /dev/cdrom - /dev/block/sr0, not changing /dev/cdrom if it already exists. 2) I like to use netlink to obtain hotplug information and avoid massive respawning of mdev as hotplug helper when several events arrive quickly. That is, I want to auto fork a daemon which just open the netlink socket. When events arrive it forks again, creating a pipe. The new instance read mdev.conf, build a table of rules in memory, then read hotplug
Re: RFD: Rework/extending functionality of mdev
Hi Harald ! 1) Starting up a system with mdev usually involves the same steps to mount proc, sys and a tmpfs on dev, then adding some subdirectories to /dev, setting owner, group and permissions, symlinking some entries and possibly mounting more virtual file systems. This work need to be done by the controlling startup script, so we reinvent the wheel in every system setup. The thing is, this work doesn't always need to be done, and when it does, it does whether you use udevd, mdev, or anything else. /dev preparation is one of the possible building blocks of an init routine, and it's quite separate from mounting /proc and /sys, and it's also separate from populating /dev with mdev -s. So I'd say that it's not about reinventing the wheel, it's that every system has different needs - and a 10-line script is easy enough to copy if you have several systems with the same needs. I dislike the idea of integrating early init functions into mdev, because those functions are geographically (as in the place where they're performed) close to mdev, but they have nothing to do with mdev logically. - Additional syntax to mount (virtual) file systems: MOUNTPOINT UID:GID PERMISSIONS %FSTYPE [=DEVICE] [OPTIONS] I can't help but think this somehow duplicates the work of mount with /etc/fstab. If you really want to integrate the functionality in some binary instead of scripting it, I think the right way would be to patch the mount applet so it accepts any file (instead of hardcoding /etc/fstab), so you could have a early mount points file, distinct from your /etc/fstab, that you would call mount on before running mdev -s. 2) I like to use netlink to obtain hotplug information and avoid massive respawning of mdev as hotplug helper when several events arrive quickly. That is, I want to auto fork a daemon which just open the netlink socket. So far, what you're talking about already exists: http://skarnet.org/software/s6-linux-utils/s6-devd.html When events arrive it forks again, creating a pipe. The new instance read mdev.conf, build a table of rules in memory, then read hotplug operations from the pipe (send by the first instance). ... but here it's different. s6-devd spawns a helper program for every event, like /proc/sys/kernel/hotplug would do, but it reads the events from the netlink so they're serialized. In your design, the long-lived process forks a unique helper that reads mdev.conf... so it's not meant to be used with any program compatible with /proc/sys/kernel/hotplug - it's only meant to be used with mdev. So, why fork at all ? Your temporary instance is unnecessary - just have a daemon that parses mdev.conf, then listens to the netlink, and handles the events itself. ... and you just reinvented udevd. Congratulations ! ;) Seriously, udev is a hard problem. udevd is bloated and its configuration is too complex; mdev is close to the point where parsing a config file for every invocation will become too expensive - I believe that it has not reached that point yet, but if you add more stuff to the conf language, it will. What we need is a configuration language for udev that's easy to understand (for a human) and fast to parse (for a machine). More thought needs to be poured into it - and it's in my plans, but not for the short term. What I'm sure of, though, is that the fork a helper for every event vs. handle events in a unique long-lived program debate is the wrong one - it's an implementation detail, that can be solved *after* proper udev language design. -- Laurent ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox