Re: hotplug and modalias handling (WAS: Re: RFD: Rework/extending functionality of mdev)
On Tue, 17 Mar 2015 17:51:39 -0600 James Bowlin bit...@gmail.com wrote: On Mon, Mar 16, 2015 at 09:55 AM, Natanael Copa said: On Fri, 13 Mar 2015 13:12:56 -0600 James Bowlin bit...@gmail.com wrote: TL;DR: the current busybox (or my code) seems to be broken on one or two systems out of thousands; crucial modules don't get loaded using hotplug + coldplug. Please give me a runtime option so I can give my users a boot-time option to try the new approach. ... My current plan is to make repeated coldplugging the default method for loading modules since in every case I've been able to investigate, an alias for the missing module(s) is in the output of the find command I gave above. I haven't yet done exhaustive testing but every test with repeated coldplugging has worked even on the system where the hotplugging is flaky (and which is now out of my reach). Interstingly, this is what we do on alpine linux too: http://git.alpinelinux.org/cgit/mkinitfs/tree/initramfs-init.in#n530 So apparently we must have had the same problem in the past. My current plan is to user netlink to serialize the hotplug events and instead of scanning /sys/devices for modailas entries, I'll trigger 'add' uevents. (I will also make the netlink listener stop as soon as it has collected all needed bits to set up the root fs) I don't know why hotplugging works most of the time but not all the time. It could be a race condition. I wonder if the problem persist if you serialize mdev using /dev/mdev.seq ISTM as long as I start the hotplugging before I do the first (and used to be only) coldplug, there is not a lot I can mess up. Another check is that with hotplugging disabled and only a single coldplug then there are very few (if any?) situations where it will boot since it usually takes a few seconds for the boot device to show up and that first cold plug is done ASAP. Peace, James ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
Le 17/03/2015 19:56, Harald Becker a écrit : Hi Didier, On 17.03.2015 19:00, Didier Kryn wrote: The common practice of daemons to put themselves in background and orphan themself is starting to become disaproved by many designers. I tend to share this opinion. If such a behaviour is desired, it may well be done in the script (nohup), and the go to background feature be completely removed from the daemon proper. The idea behind this change is to allow for supervisor not being process #1. Ack, for the case the daemon does not allow to be used with an external supervisor. Invoking a daemon from scripts is no problem, but did you ever come in a situation, where you needed to maintain a system by hand? Therefor I personally vote for having a simple command doing auto background of the daemon, allowing to run from a supervisor, by a simple extra parameter (e.g. -n). Which is usually no problem, as the supervisor need any kind of configuration, where you should be able to add the arguments, the daemon gets started with. So you have to enter that parameter just once for your usage from supervisor, but save extra parameters for manual invocation. Long lived daemons should have both startup methods, selectable by a parameter, so you make nobodies work more difficult than required. Dropping the auto background feature, would mean, saving a single function call to fork and may be an exit. This will result in a savage of roughly around 10 to 40 Byte of the binary (typical x86 32 bit). To much cost to allow both usages? OK, I think you are right, because it is a little more than a fork: you want to detach from the controlling terminal and start a new session. I agree that it is a pain to do it by hand and it is OK if there is a command-line switch to avoid all of it. But there must be this switch. Could you clarify, please: do you mean implementing in netlink the logic to restart fifosvd? Previously you described it as just a data funnel. No, restart is not required, as netlink dies, when fifosvd dies (or later on when the handler dies), the supervisor watching netlink may then fire up a new netlink reader (possibly after failure management), where this startup is always done through a central startup command (e.g. xdev). The supervisor, never starts up the netlink reader directly, but watches the process it starts up for xdev. xdev does it's initial action (startup code) then chains (exec) to the netlink reader. This may look ugly and unnecessary complicated at the first glance, but is a known practical trick to drop some memory resources not needed by the long lived daemon, but required by the start up code. For the supervisor instance this looks like a single process, it has started and it may watch until it exits. So from that view it looks, as if netlink has created the pipe and started the fifosvd, but in fact this is done by the startup code (difference between flow of operation and technical placing the code). I didn't notice this trick in your description. It is making more and more sense :-). Now look, since nldev (lest's call it by its name) is execed by xdev, it remains the parent of fifosvd, and therefore it shall receive the SIGCLD if fifosvd dies. This is the best way for nldev to watch fifosvd. Otherwise it should wait until it receives an event from the netlink and tries to write it to the pipe, hence loosing the event and the possible burst following it. nldev must die on SIGCLD (after piping available events, though); this is the only supervision logic it must implement, but I think it is critical. And it is the same if nldev is launched with a long-lived mdev-i without a fifosvd. Well, this is what I thought, but the manual says an empty end causes end-of file, not mentionning the pipe being empty. end-of-file always include the pipe being empty. Consider a pipe which has still some data in it, when the writer closes the write-end. If the reader would receive eof before all data has bean consumed, it would lose some data. That would be absolutely unreliable. Therefore, the eof is only forwarded to the read end, when the pipe is empty. I agree that the other way wouldn't work. Just noticing the manual is wrong/unclear on that point. *Does anybody know the exact specification of poll behavior on this case?* My experience, with select() which is roughly the same, is that it does not detect EOF. And, since fifosvd must not read the pipe, how does it detect that it is broken? Not detect? Sure you closed all open file descriptors for the write end (a common cave-eat)? I have never bean hit by such a case, except anyone forgot to close all file descriptors of the write end. You notice that something happened on input (AFAIR) but I'm sure you don't know what. It may be data as well. You must read() to know. Anyway you don't want to poll() the pipe unless mdev-i is dead
Re: RFD: Rework/extending functionality of mdev
On 18.03.2015 10:42, Didier Kryn wrote: Long lived daemons should have both startup methods, selectable by a parameter, so you make nobodies work more difficult than required. OK, I think you are right, because it is a little more than a fork: you want to detach from the controlling terminal and start a new session. I agree that it is a pain to do it by hand and it is OK if there is a command-line switch to avoid all of it. But there must be this switch. Ack! No, restart is not required, as netlink dies, when fifosvd dies (or later on when the handler dies), the supervisor watching netlink may then fire up a new netlink reader (possibly after failure management), where this startup is always done through a central startup command (e.g. xdev). The supervisor, never starts up the netlink reader directly, but watches the process it starts up for xdev. xdev does it's initial action (startup code) then chains (exec) to the netlink reader. This may look ugly and unnecessary complicated at the first glance, but is a known practical trick to drop some memory resources not needed by the long lived daemon, but required by the start up code. For the supervisor instance this looks like a single process, it has started and it may watch until it exits. So from that view it looks, as if netlink has created the pipe and started the fifosvd, but in fact this is done by the startup code (difference between flow of operation and technical placing the code). I didn't notice this trick in your description. It is making more and more sense :-). I left it out, to make it not unnecessary complicated, and I wanted to focus on the netlink / pipe operation. Now look, since nldev (lest's call it by its name) is execed by xdev, it remains the parent of fifosvd, and therefore it shall receive the SIGCLD if fifosvd dies. This is the best way for nldev to watch fifosvd. Otherwise it should wait until it receives an event from the netlink and tries to write it to the pipe, hence loosing the event and the possible burst following it. nldev must die on SIGCLD (after piping available events, though); this is the only supervision logic it must implement, but I think it is critical. And it is the same if nldev is launched with a long-lived mdev-i without a fifosvd. netlink reader (nldev) does not need to explicitly watch the fifosvd by SIGCHLD. Either that piece of code does it's job, or it fails and dies. When fifosvd dies, the read end of the pipe is closed (by kernel), except there is still a handler process (which shall process remaining events from the pipe). As soon as there is neither a fifosvd, nor a handler process, the pipe is shut down by the kernel, and nldev get error when writing to the pipe, so it knows the other end died. You won't gain much benefit from watching SIGCHLD and reading the process status. It either will give you the information, fifosvd process is still running, or it died (failed). The same information you get from the write to the pipe, when the read end died, you get EPIPE. Limiting the time, nldev tries to write to the pipe, would although allow to detect stuck operation of fifosvd / handler (won't be given by SIGCHLD watching) ... but (in parallel I discussed that with Laurent), the question is, how to react, when write to the pipe stuck (but no failure)? We can't do much here, and are in trouble either, but Laurent gave the argument: The netlink socket also contain a buffer, which may hold additional events, so we do not loss them, in case processing continues normally. When the kernel buffer fills up to it's limit, let the kernel react to the problem. ... otherwise you are right, nldev's job is to detect failure of the rest of the chain (that is supervise those), and has to react on this. The details of taken actions in this case, need and can be discussed (and may be later adapted), without much impact on other operation. This clearly means, I'm open for suggestions, which kind of failure handling shall be done. Every action taken, to improve reaction, which is of benefit for the major purpose of the netlink reader, without blowing this up needlessly, is of interest (hold in mind: long lived daemon, trying to keep it simple and small). My suggestion is: Let the netlink reader detect relevant errors, and exec (not spawn) a script of given name, when there are failures. This is small, and gives the invoked script full control on the failure management (no fixed functionality in a binary). When done, it can either die, letting a higher instance doing the job to restart, or exec back and re-start the hotplug system (may be with a different mechanism). When the script does not exist, the default action is to exit the netlink reader process unsuccessful, giving a higher instance a failure indication and the possibility to react on it. Not detect? Sure you closed all open file descriptors for the write end (a common cave-eat)?
Re: Add a user/password interface for a Telnet and ftp connect
On 18.03.2015 15:50, Alexis Guilloteau wrote: After looking at the help of the ftpd function in busybox i know that the main function is to create an anonymous ftp server so i was not surprised with the lack but do you think there would be a solution for that ? Busybox ftpd is an anonymous ftpd, without access restrictions. I suggest putting the files to be served in a separate directory, using a chroot and running ftpd with a low privileged user (not as root), so ftp access goes not to system related files. ... else you need a full ftpd package (not Busybox ftpd). And pretty much the same thing for telnetd. If login to telnetd is done the usual way, it should use /bin/login, which shall ask for user name and password, but beware all those information is send in clear (readable) text on the net. Right now the only user on the board is the root with no password. May be that's your problem. Have you set up your password system correct /etc/passwd, /etc/shadow, /etc/group ? ... and based on information from your mail: Is your inetd running in the right directory? Has it access to the other commands (especially when your BB is not installed at /bin/busybox)? ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: Add a user/password interface for a Telnet and ftp connect
On 03/18/2015 10:50 AM, Alexis Guilloteau wrote: Hi, Right now i can run a Telnet daemon on one of my board using the command /usb/sbin/telnet -l /bin/sh and run a ftp daemon thanks to inetd.conf but each of the connection are anonymous (well, it ask for a user name but not for a password...) After looking at the help of the ftpd function in busybox i know that the main function is to create an anonymous ftp server so i was not surprised with the lack but do you think there would be a solution for that ? I would think it would be to work into the .c file of the function but maybe you have another idea. And pretty much the same thing for telnetd. Right now the only user on the board is the root with no password. If you do want any sort of security, you are better off using ssh and sftp. Telnet and ftp don't really offer any security even with passwords, so there is little interest in adding support for permissions to those applets. The most popular ssh for embedded is 'dropbear': https://matt.ucc.asn.au/dropbear/dropbear.html -Mike ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
On 18/03/2015 18:08, Didier Kryn wrote: No, you must write to the pipe to detect it is broken. And you won't try to write before you've got an event from the netlink. This event will be lost. I skim over that discussion (because I don't agree with the design) so I can't make any substantial comments, but here's a nitpick: if you use an asynchronous event loop, your selector triggers - POLLHUP for poll(), not sure if it's writability or exception for select()- as soon as a pipe is broken. Note that events can still be lost, because the pipe can be broken while you're reading a message from the netlink, before you come back to the selector; so the message you just read cannot be sent. But that is a risk you have to take everytime you perform buffered IO, there's no way around it. -- Laurent ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
Hi Laurent ! Note that events can still be lost, because the pipe can be broken while you're reading a message from the netlink, before you come back to the selector; so the message you just read cannot be sent. But that is a risk you have to take everytime you perform buffered IO, there's no way around it. To make clear, about what case we talk, that is: - spawn conf parser / device operation process - exit with failure - re-spawn conf parser / device operation process - exit with failure - re-spawn conf parser / device operation process - exit with failure - ... - detect failure loop - spawn failure script - exit with failure or not zero status - giving up, close read end of pipe - let fifosvd die @Laurent: What would you do in that case? Endless respawn? - shrek! -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
Hi Laurent ! On 18/03/2015 18:08, Didier Kryn wrote: No, you must write to the pipe to detect it is broken. And you won't try to write before you've got an event from the netlink. This event will be lost. On 18.03.2015 18:41, Laurent Bercot wrote: I skim over that discussion (because I don't agree with the design) Why? Did you note my last two alternatives, unexpectedly both named #3? ... but specifically the last one Netlink the Unix way? - uses private pipe for netlink and named pipe for hotplug helper (with maximum of code sharing) - should most likely do the flow of operation as you suggested (as far I did understand you) - except, I split of the pipe watcher / on demand startup code of the conf parser / device operation into it's own thread (process), for general code usability as a different applet for on demand pipe consumer startup purposes (you had that function as integral part of your netlink reader) - and I'm currently going to split of that one-shot xdev init feature from the xdev, creating an own applet / command for this, as you suggested (extending functionality for even more general usage, as suggested by Isaac, independent from the device management, and maybe modifiable in it's operation by changing functions in a shell script) So why do you still doubt about the design? ... because I moved some code into it's own (small) helper thread? I can't make any substantial comments, but here's a nitpick: if you use an asynchronous event loop, your selector triggers - POLLHUP for poll(), not sure if it's writability or exception for select()- as soon as a pipe is broken. This is what I expected, but the problem is, the question for this arrived, and I can't find the location where this is documented. Note that events can still be lost, because the pipe can be broken while you're reading a message from the netlink, before you come back to the selector; so the message you just read cannot be sent. But that is a risk you have to take everytime you perform buffered IO, there's no way around it. Ok, what would you then do? Unbuffered I/O on the pipe, and then what? ... if that single one more message dropped, except the others not read from netlink buffer (to be lost on close), matters, then we shall indeed use unbuffered I/O on the pipe, and only read a message, when there is room for one more one more message in the pipe: set non blocking I/O on stdout establish netlink socket loop: poll for write on stdout possible, until available (may set an upper timeout limit, failure on timeout) poll for netlink read and still for write on stdout if write ability drops we are in serious trouble, failure if netlink read possible gather message from netlink write message to stdout (should never block) on EAGAIN, EINTR: do 3 write retries, then failure ... does that fit better? I don't think that it makes a big difference, but I can live with the slight bigger code. My problem is not the detection of the failing pipe write, but the reaction on it. When that happen, the down chain of the pipe most likely need more than just a restart. That is, it should only happen on serious failure in the conf file or the device operations (- manual action required). So I expect more loss of event messages, than just that single one message, you were grumbling about. Hence on hotplug restart we need to re-trigger the plug events, nevertheless! -- Harald ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: RFD: Rework/extending functionality of mdev
Le 18/03/2015 13:34, Harald Becker a écrit : On 18.03.2015 10:42, Didier Kryn wrote: Long lived daemons should have both startup methods, selectable by a parameter, so you make nobodies work more difficult than required. OK, I think you are right, because it is a little more than a fork: you want to detach from the controlling terminal and start a new session. I agree that it is a pain to do it by hand and it is OK if there is a command-line switch to avoid all of it. But there must be this switch. Ack! No, restart is not required, as netlink dies, when fifosvd dies (or later on when the handler dies), the supervisor watching netlink may then fire up a new netlink reader (possibly after failure management), where this startup is always done through a central startup command (e.g. xdev). The supervisor, never starts up the netlink reader directly, but watches the process it starts up for xdev. xdev does it's initial action (startup code) then chains (exec) to the netlink reader. This may look ugly and unnecessary complicated at the first glance, but is a known practical trick to drop some memory resources not needed by the long lived daemon, but required by the start up code. For the supervisor instance this looks like a single process, it has started and it may watch until it exits. So from that view it looks, as if netlink has created the pipe and started the fifosvd, but in fact this is done by the startup code (difference between flow of operation and technical placing the code). I didn't notice this trick in your description. It is making more and more sense :-). I left it out, to make it not unnecessary complicated, and I wanted to focus on the netlink / pipe operation. Now look, since nldev (lest's call it by its name) is execed by xdev, it remains the parent of fifosvd, and therefore it shall receive the SIGCLD if fifosvd dies. This is the best way for nldev to watch fifosvd. Otherwise it should wait until it receives an event from the netlink and tries to write it to the pipe, hence loosing the event and the possible burst following it. nldev must die on SIGCLD (after piping available events, though); this is the only supervision logic it must implement, but I think it is critical. And it is the same if nldev is launched with a long-lived mdev-i without a fifosvd. netlink reader (nldev) does not need to explicitly watch the fifosvd by SIGCHLD. Either that piece of code does it's job, or it fails and dies. When fifosvd dies, the read end of the pipe is closed (by kernel), except there is still a handler process (which shall process remaining events from the pipe). As soon as there is neither a fifosvd, nor a handler process, the pipe is shut down by the kernel, and nldev get error when writing to the pipe, so it knows the other end died. No, you must write to the pipe to detect it is broken. And you won't try to write before you've got an event from the netlink. This event will be lost. You won't gain much benefit from watching SIGCHLD and reading the process status. It either will give you the information, fifosvd process is still running, or it died (failed). The same information you get from the write to the pipe, when the read end died, you get EPIPE. You get the information immediately from SIGCLD. You get it too late from the pipe, and you loose at least one event for sure, a whole burst if there is. Limiting the time, nldev tries to write to the pipe, would although allow to detect stuck operation of fifosvd / handler (won't be given by SIGCHLD watching) ... but (in parallel I discussed that with Laurent), the question is, how to react, when write to the pipe stuck (but no failure)? We can't do much here, and are in trouble either, but Laurent gave the argument: The netlink socket also contain a buffer, which may hold additional events, so we do not loss them, in case processing continues normally. When the kernel buffer fills up to it's limit, let the kernel react to the problem. Sure, the limit here is pipe size (adjustable) + netlink buffer size. ... otherwise you are right, nldev's job is to detect failure of the rest of the chain (that is supervise those), and has to react on this. The details of taken actions in this case, need and can be discussed (and may be later adapted), without much impact on other operation. This clearly means, I'm open for suggestions, which kind of failure handling shall be done. Every action taken, to improve reaction, which is of benefit for the major purpose of the netlink reader, without blowing this up needlessly, is of interest (hold in mind: long lived daemon, trying to keep it simple and small). My suggestion is: Let the netlink reader detect relevant errors, and exec (not spawn) a script of given name, when there are failures. This is small, and gives the invoked script full control on the failure management (no fixed functionality in a binary). When