Re: hotplug and modalias handling (WAS: Re: RFD: Rework/extending functionality of mdev)

2015-03-18 Thread Natanael Copa
On Tue, 17 Mar 2015 17:51:39 -0600
James Bowlin bit...@gmail.com wrote:

 On Mon, Mar 16, 2015 at 09:55 AM, Natanael Copa said:
  On Fri, 13 Mar 2015 13:12:56 -0600
  James Bowlin bit...@gmail.com wrote:
  
   TL;DR: the current busybox (or my code) seems to be broken on one
   or two systems out of thousands; crucial modules don't get loaded
   using hotplug + coldplug.  Please give me a runtime option so I
   can give my users a boot-time option to try the new approach.
  
...

 My current plan is to make repeated coldplugging the default
 method for loading modules since in every case I've been able to
 investigate, an alias for the missing module(s) is in the output
 of the find command I gave above.  I haven't yet done exhaustive
 testing but every test with repeated coldplugging has worked even
 on the system where the hotplugging is flaky (and which is now
 out of my reach).

Interstingly, this is what we do on alpine linux too:
http://git.alpinelinux.org/cgit/mkinitfs/tree/initramfs-init.in#n530

So apparently we must have had the same problem in the past.

My current plan is to user netlink to serialize the hotplug events and
instead of scanning /sys/devices for modailas entries, I'll trigger
'add' uevents.

(I will also make the netlink listener stop as soon as it has collected
all needed bits to set up the root fs)

 I don't know why hotplugging works most of the time but not all
 the time.

It could be a race condition.

I wonder if the problem persist if you serialize mdev using /dev/mdev.seq


 ISTM as long as I start the hotplugging before I do
 the first (and used to be only) coldplug, there is not a lot I
 can mess up.  Another check is that  with hotplugging disabled
 and only a single coldplug then there are very few (if any?)
 situations where it will boot since it usually takes a few
 seconds for the boot device to show up and that first cold plug
 is done ASAP.
 
 
 Peace, James

___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: RFD: Rework/extending functionality of mdev

2015-03-18 Thread Didier Kryn


Le 17/03/2015 19:56, Harald Becker a écrit :

Hi Didier,

On 17.03.2015 19:00, Didier Kryn wrote:

 The common practice of daemons to put themselves in background and
orphan themself is starting to become disaproved by many designers. I
tend to share this opinion. If such a behaviour is desired, it may well
be done in the script (nohup), and the go to background feature be
completely removed from the daemon proper. The idea behind this change
is to allow for supervisor not being process #1.


Ack, for the case the daemon does not allow to be used with an 
external supervisor.


Invoking a daemon from scripts is no problem, but did you ever come in 
a situation, where you needed to maintain a system by hand? Therefor I 
personally vote for having a simple command doing auto background of 
the daemon, allowing to run from a supervisor, by a simple extra 
parameter (e.g. -n). Which is usually no problem, as the supervisor 
need any kind of configuration, where you should be able to add the 
arguments, the daemon gets started with. So you have to enter that 
parameter just once for your usage from supervisor, but save extra 
parameters for manual invocation.


Long lived daemons should have both startup methods, selectable by a 
parameter, so you make nobodies work more difficult than required.


Dropping the auto background feature, would mean, saving a single 
function call to fork and may be an exit. This will result in a savage 
of roughly around 10 to 40 Byte of the binary (typical x86 32 bit). To 
much cost to allow both usages?


OK, I think you are right, because it is a little more than a fork: 
you want to detach from the controlling terminal and start a new 
session. I agree that it is a pain to do it by hand and it is OK if 
there is a command-line switch to avoid all of it. But there must be 
this switch.






 Could you clarify, please: do you mean implementing in netlink the
logic to restart fifosvd? Previously you described it as just a data
funnel.


No, restart is not required, as netlink dies, when fifosvd dies (or 
later on when the handler dies), the supervisor watching netlink may 
then fire up a new netlink reader (possibly after failure management), 
where this startup is always done through a central startup command 
(e.g. xdev).


The supervisor, never starts up the netlink reader directly, but 
watches the process it starts up for xdev. xdev does it's initial 
action (startup code) then chains (exec) to the netlink reader. This 
may look ugly and unnecessary complicated at the first glance, but is 
a known practical trick to drop some memory resources not needed by 
the long lived daemon, but required by the start up code. For the 
supervisor instance this looks like a single process, it has started 
and it may watch until it exits. So from that view it looks, as if 
netlink has created the pipe and started the fifosvd, but in fact this 
is done by the startup code (difference between flow of operation and 
technical placing the code).


I didn't notice this trick in your description. It is making more 
and more sense :-).


Now look, since nldev (lest's call it by its name) is execed by 
xdev, it remains the parent of fifosvd, and therefore it shall receive 
the SIGCLD if fifosvd dies. This is the best way for nldev to watch 
fifosvd. Otherwise it should wait until it receives an event from the 
netlink and tries to write it to the pipe, hence loosing the event and 
the possible burst following it. nldev must die on SIGCLD (after piping 
available events, though); this is the only supervision logic it must 
implement, but I think it is critical. And it is the same if nldev is 
launched with a long-lived mdev-i without a fifosvd.






 Well, this is what I thought, but the manual says an empty end
causes end-of file, not mentionning the pipe being empty.


end-of-file always include the pipe being empty. Consider a pipe which 
has still some data in it, when the writer closes the write-end. If 
the reader would receive eof before all data has bean consumed, it 
would lose some data. That would be absolutely unreliable. Therefore, 
the eof is only forwarded to the read end, when the pipe is empty.
I agree that the other way wouldn't work. Just noticing the manual 
is wrong/unclear on that point.





*Does anybody know the exact specification of poll behavior on this
case?*

 My experience, with select() which is roughly the same, is that it
does not detect EOF. And, since fifosvd must not read the pipe, how does
it detect that it is broken?


Not detect? Sure you closed all open file descriptors for the write 
end (a common cave-eat)? I have never bean hit by such a case, except 
anyone forgot to close all file descriptors of the write end.
You notice that something happened on input (AFAIR) but I'm sure 
you don't know what. It may be data as well. You must read() to know.


Anyway you don't want to poll() the pipe unless mdev-i is dead 

Re: RFD: Rework/extending functionality of mdev

2015-03-18 Thread Harald Becker

On 18.03.2015 10:42, Didier Kryn wrote:

Long lived daemons should have both startup methods, selectable by a
parameter, so you make nobodies work more difficult than required.


 OK, I think you are right, because it is a little more than a fork:
you want to detach from the controlling terminal and start a new
session. I agree that it is a pain to do it by hand and it is OK if
there is a command-line switch to avoid all of it.



But there must be this switch.


Ack!



No, restart is not required, as netlink dies, when fifosvd dies (or
later on when the handler dies), the supervisor watching netlink may
then fire up a new netlink reader (possibly after failure management),
where this startup is always done through a central startup command
(e.g. xdev).

The supervisor, never starts up the netlink reader directly, but
watches the process it starts up for xdev. xdev does it's initial
action (startup code) then chains (exec) to the netlink reader. This
may look ugly and unnecessary complicated at the first glance, but is
a known practical trick to drop some memory resources not needed by
the long lived daemon, but required by the start up code. For the
supervisor instance this looks like a single process, it has started
and it may watch until it exits. So from that view it looks, as if
netlink has created the pipe and started the fifosvd, but in fact this
is done by the startup code (difference between flow of operation and
technical placing the code).


 I didn't notice this trick in your description. It is making more
and more sense :-).


I left it out, to make it not unnecessary complicated, and I wanted to 
focus on the netlink / pipe operation.




 Now look, since nldev (lest's call it by its name) is execed by
xdev, it remains the parent of fifosvd, and therefore it shall receive
the SIGCLD if fifosvd dies. This is the best way for nldev to watch
fifosvd. Otherwise it should wait until it receives an event from the
netlink and tries to write it to the pipe, hence loosing the event and
the possible burst following it. nldev must die on SIGCLD (after piping
available events, though); this is the only supervision logic it must
implement, but I think it is critical. And it is the same if nldev is
launched with a long-lived mdev-i without a fifosvd.


netlink reader (nldev) does not need to explicitly watch the fifosvd by 
SIGCHLD.


Either that piece of code does it's job, or it fails and dies. When 
fifosvd dies, the read end of the pipe is closed (by kernel), except 
there is still a handler process (which shall process remaining events 
from the pipe). As soon as there is neither a fifosvd, nor a handler 
process, the pipe is shut down by the kernel, and nldev get error when 
writing to the pipe, so it knows the other end died.


You won't gain much benefit from watching SIGCHLD and reading the 
process status. It either will give you the information, fifosvd process 
is still running, or it died (failed). The same information you get from 
the write to the pipe, when the read end died, you get EPIPE.


Limiting the time, nldev tries to write to the pipe, would although 
allow to detect stuck operation of fifosvd / handler (won't be given by 
SIGCHLD watching) ... but (in parallel I discussed that with Laurent), 
the question is, how to react, when write to the pipe stuck (but no 
failure)? We can't do much here, and are in trouble either, but Laurent 
gave the argument: The netlink socket also contain a buffer, which may 
hold additional events, so we do not loss them, in case processing 
continues normally. When the kernel buffer fills up to it's limit, let 
the kernel react to the problem.


... otherwise you are right, nldev's job is to detect failure of the 
rest of the chain (that is supervise those), and has to react on this. 
The details of taken actions in this case, need and can be discussed 
(and may be later adapted), without much impact on other operation.


This clearly means, I'm open for suggestions, which kind of failure 
handling shall be done. Every action taken, to improve reaction, which 
is of benefit for the major purpose of the netlink reader, without 
blowing this up needlessly, is of interest (hold in mind: long lived 
daemon, trying to keep it simple and small).


My suggestion is: Let the netlink reader detect relevant errors, and 
exec (not spawn) a script of given name, when there are failures. This 
is small, and gives the invoked script full control on the failure 
management (no fixed functionality in a binary). When done, it can 
either die, letting a higher instance doing the job to restart, or exec 
back and re-start the hotplug system (may be with a different 
mechanism). When the script does not exist, the default action is to 
exit the netlink reader process unsuccessful, giving a higher instance a 
failure indication and the possibility to react on it.




Not detect? Sure you closed all open file descriptors for the write
end (a common cave-eat)? 

Re: Add a user/password interface for a Telnet and ftp connect

2015-03-18 Thread Harald Becker

On 18.03.2015 15:50, Alexis Guilloteau wrote:

After looking at the help of the ftpd function in busybox i know that

the main function is to create an anonymous ftp server so i was not
surprised with the lack but do you think there would be a solution for
that ?


Busybox ftpd is an anonymous ftpd, without access restrictions. I 
suggest putting the files to be served in a separate directory, using a 
chroot and running ftpd with a low privileged user (not as root), so ftp 
access goes not to system related files.


... else you need a full ftpd package (not Busybox ftpd).


And pretty much the same thing for telnetd.


If login to telnetd is done the usual way, it should use /bin/login, 
which shall ask for user name and password, but beware all those 
information is send in clear (readable) text on the net.



Right now the only user on the board is the root with no password.


May be that's your problem. Have you set up your password system correct 
/etc/passwd, /etc/shadow, /etc/group ?


... and based on information from your mail: Is your inetd running in 
the right directory? Has it access to the other commands (especially 
when your BB is not installed at /bin/busybox)?


___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Add a user/password interface for a Telnet and ftp connect

2015-03-18 Thread Michael Conrad

On 03/18/2015 10:50 AM, Alexis Guilloteau wrote:

Hi,

Right now i can run a Telnet daemon on one of my board using the 
command /usb/sbin/telnet -l /bin/sh and run a ftp daemon thanks to 
inetd.conf but each of the connection are anonymous (well, it ask for 
a user name but not for a password...)
After looking at the help of the ftpd function in busybox i know that 
the main function is to create an anonymous ftp server so i was not 
surprised with the lack but do you think there would be a solution for 
that ? I would think it would be to work into the .c file of the 
function but maybe you have another idea.
And pretty much the same thing for telnetd. Right now the only user on 
the board is the root with no password.




If you do want any sort of security, you are better off using ssh and 
sftp.  Telnet and ftp don't really offer any security even with 
passwords, so there is little interest in adding support for permissions 
to those applets.  The most popular ssh for embedded is 'dropbear': 
https://matt.ucc.asn.au/dropbear/dropbear.html


-Mike
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: RFD: Rework/extending functionality of mdev

2015-03-18 Thread Laurent Bercot

On 18/03/2015 18:08, Didier Kryn wrote:

No, you must write to the pipe to detect it is broken. And you won't
try to write before you've got an event from the netlink. This event
will be lost.


 I skim over that discussion (because I don't agree with the design) so
I can't make any substantial comments, but here's a nitpick: if you
use an asynchronous event loop, your selector triggers - POLLHUP for
poll(), not sure if it's writability or exception for select()- as
soon as a pipe is broken.

 Note that events can still be lost, because the pipe can be broken
while you're reading a message from the netlink, before you come
back to the selector; so the message you just read cannot be sent.
But that is a risk you have to take everytime you perform buffered IO,
there's no way around it.

--
 Laurent
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: RFD: Rework/extending functionality of mdev

2015-03-18 Thread Harald Becker

Hi Laurent !


  Note that events can still be lost, because the pipe can be broken
while you're reading a message from the netlink, before you come
back to the selector; so the message you just read cannot be sent.
But that is a risk you have to take everytime you perform buffered IO,
there's no way around it.


To make clear, about what case we talk, that is:

- spawn conf parser / device operation process
- exit with failure
- re-spawn conf parser / device operation process
- exit with failure
- re-spawn conf parser / device operation process
- exit with failure
- ...
- detect failure loop
- spawn failure script
- exit with failure or not zero status
- giving up, close read end of pipe
- let fifosvd die

@Laurent: What would you do in that case?

Endless respawn? - shrek!

--
Harald
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: RFD: Rework/extending functionality of mdev

2015-03-18 Thread Harald Becker

Hi Laurent !

 On 18/03/2015 18:08, Didier Kryn wrote:

No, you must write to the pipe to detect it is broken. And you won't
try to write before you've got an event from the netlink. This event
will be lost.


On 18.03.2015 18:41, Laurent Bercot wrote:

  I skim over that discussion (because I don't agree with the design)


Why?

Did you note my last two alternatives, unexpectedly both named #3?
... but specifically the last one Netlink the Unix way?

- uses private pipe for netlink and named pipe for hotplug helper
  (with maximum of code sharing)

- should most likely do the flow of operation as you suggested
  (as far I did understand you)

- except, I split of the pipe watcher / on demand startup code of the 
conf parser / device operation into it's own thread (process), for 
general code usability as a different applet for on demand pipe consumer 
startup purposes

(you had that function as integral part of your netlink reader)

- and I'm currently going to split of that one-shot xdev init feature 
from the xdev, creating an own applet / command for this, as you suggested
(extending functionality for even more general usage, as suggested by 
Isaac, independent from the device management, and maybe modifiable in 
it's operation by changing functions in a shell script)


So why do you still doubt about the design? ... because I moved some 
code into it's own (small) helper thread?




I can't make any substantial comments, but here's a nitpick: if you
use an asynchronous event loop, your selector triggers - POLLHUP for
poll(), not sure if it's writability or exception for select()- as
soon as a pipe is broken.


This is what I expected, but the problem is, the question for this 
arrived, and I can't find the location where this is documented.




  Note that events can still be lost, because the pipe can be broken
while you're reading a message from the netlink, before you come
back to the selector; so the message you just read cannot be sent.
But that is a risk you have to take everytime you perform buffered IO,
there's no way around it.


Ok, what would you then do? Unbuffered I/O on the pipe, and then what?

... if that single one more message dropped, except the others not read 
from netlink buffer (to be lost on close), matters, then we shall indeed 
use unbuffered I/O on the pipe, and only read a message, when there is 
room for one more one more message in the pipe:


  set non blocking I/O on stdout
  establish netlink socket
loop:
  poll for write on stdout possible, until available
  (may set an upper timeout limit, failure on timeout)
  poll for netlink read and still for write on stdout
  if write ability drops
we are in serious trouble, failure
  if netlink read possible
gather message from netlink
write message to stdout (should never block)
on EAGAIN, EINTR: do 3 write retries, then failure

... does that fit better? I don't think that it makes a big difference, 
but I can live with the slight bigger code.


My problem is not the detection of the failing pipe write, but the 
reaction on it. When that happen, the down chain of the pipe most likely 
need more than just a restart. That is, it should only happen on serious 
failure in the conf file or the device operations (- manual action 
required). So I expect more loss of event messages, than just that 
single one message, you were grumbling about. Hence on hotplug restart 
we need to re-trigger the plug events, nevertheless!


--
Harald

___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: RFD: Rework/extending functionality of mdev

2015-03-18 Thread Didier Kryn


Le 18/03/2015 13:34, Harald Becker a écrit :

On 18.03.2015 10:42, Didier Kryn wrote:

Long lived daemons should have both startup methods, selectable by a
parameter, so you make nobodies work more difficult than required.


 OK, I think you are right, because it is a little more than a fork:
you want to detach from the controlling terminal and start a new
session. I agree that it is a pain to do it by hand and it is OK if
there is a command-line switch to avoid all of it.



But there must be this switch.


Ack!



No, restart is not required, as netlink dies, when fifosvd dies (or
later on when the handler dies), the supervisor watching netlink may
then fire up a new netlink reader (possibly after failure management),
where this startup is always done through a central startup command
(e.g. xdev).

The supervisor, never starts up the netlink reader directly, but
watches the process it starts up for xdev. xdev does it's initial
action (startup code) then chains (exec) to the netlink reader. This
may look ugly and unnecessary complicated at the first glance, but is
a known practical trick to drop some memory resources not needed by
the long lived daemon, but required by the start up code. For the
supervisor instance this looks like a single process, it has started
and it may watch until it exits. So from that view it looks, as if
netlink has created the pipe and started the fifosvd, but in fact this
is done by the startup code (difference between flow of operation and
technical placing the code).


 I didn't notice this trick in your description. It is making more
and more sense :-).


I left it out, to make it not unnecessary complicated, and I wanted to 
focus on the netlink / pipe operation.




 Now look, since nldev (lest's call it by its name) is execed by
xdev, it remains the parent of fifosvd, and therefore it shall receive
the SIGCLD if fifosvd dies. This is the best way for nldev to watch
fifosvd. Otherwise it should wait until it receives an event from the
netlink and tries to write it to the pipe, hence loosing the event and
the possible burst following it. nldev must die on SIGCLD (after piping
available events, though); this is the only supervision logic it must
implement, but I think it is critical. And it is the same if nldev is
launched with a long-lived mdev-i without a fifosvd.


netlink reader (nldev) does not need to explicitly watch the fifosvd 
by SIGCHLD.


Either that piece of code does it's job, or it fails and dies. When 
fifosvd dies, the read end of the pipe is closed (by kernel), except 
there is still a handler process (which shall process remaining events 
from the pipe). As soon as there is neither a fifosvd, nor a handler 
process, the pipe is shut down by the kernel, and nldev get error when 
writing to the pipe, so it knows the other end died.


No, you must write to the pipe to detect it is broken. And you 
won't try to write before you've got an event from the netlink. This 
event will be lost.


You won't gain much benefit from watching SIGCHLD and reading the 
process status. It either will give you the information, fifosvd 
process is still running, or it died (failed). The same information 
you get from the write to the pipe, when the read end died, you get EPIPE.


You get the information immediately from SIGCLD. You get it too 
late from the pipe, and you loose at least one event for sure, a whole 
burst if there is.




Limiting the time, nldev tries to write to the pipe, would although 
allow to detect stuck operation of fifosvd / handler (won't be given 
by SIGCHLD watching) ... but (in parallel I discussed that with 
Laurent), the question is, how to react, when write to the pipe stuck 
(but no failure)? We can't do much here, and are in trouble either, 
but Laurent gave the argument: The netlink socket also contain a 
buffer, which may hold additional events, so we do not loss them, in 
case processing continues normally. When the kernel buffer fills up to 
it's limit, let the kernel react to the problem.

Sure, the limit here is pipe size (adjustable) + netlink buffer size.


... otherwise you are right, nldev's job is to detect failure of the 
rest of the chain (that is supervise those), and has to react on this. 
The details of taken actions in this case, need and can be discussed 
(and may be later adapted), without much impact on other operation.


This clearly means, I'm open for suggestions, which kind of failure 
handling shall be done. Every action taken, to improve reaction, which 
is of benefit for the major purpose of the netlink reader, without 
blowing this up needlessly, is of interest (hold in mind: long lived 
daemon, trying to keep it simple and small).


My suggestion is: Let the netlink reader detect relevant errors, and 
exec (not spawn) a script of given name, when there are failures. This 
is small, and gives the invoked script full control on the failure 
management (no fixed functionality in a binary). When