On 10/27/2014 01:29 PM, Nicolas Pitre wrote:
> On Fri, 24 Oct 2014, Geert Uytterhoeven wrote:
> 
>> Several patches are linked from
>> http://elinux.org/Deferred_Initcalls
>>
>> Latest version is
>> http://elinux.org/images/5/51/0001-Port-deferred-initcalls-to-3.10.patch
> 
> In the hope of providing some constructive and concrete feedback to this 
> thread, here's what I have to say about the patch linked above ( I 
> looked only at the latest version):
> 
> - Commented out code is not acceptable for mainline. But everyone knows 
>   that already.
> 
> - Returning a null byte through the /proc file is dubious.
> 
> - The /proc interface is probably not the best. I'd go with an entry in 
>   /sys/kernel instead.
> 
> - If the deferred_initcall section is empty, this could return 1 upfront 
>   and do the free_initmem() earlier as it used to.
> 
> - It was mentioned somewhere that the config system could use a 4th 
>   state in addition to n, m and y.  That would be required before this 
>   goes upstream simply to express all the dependencies between modules.  
>   Right now if a core module is configured with m, then all the 
>   submodules that depend on it inherit the modular-only restriction.  
>   The same would need to be enforced for deferred initcalls.
> 
> - Currently all deferred initcalls are lumped together in a single 
>   section with no regards to the original initcall level. This is likely 
>   to cause trouble if two initcalls are called in a different order than 
>   intended. Nothing prevents that from happening right now.
> 
> This patch is still not generic enough for mainline inclusion IMHO.  It 
> currently falls in the "you better know what you're doing" category and 
> that is possibly good enough for its actual users.  Trying to make this 
> more generic is going to require some more work.  And this would have to 
> come with serious arguments explaining why simply using modules in the 
> first place is not acceptable.

Sorry to take so long to reply.  This feedback is very welcome,
and I appreciate the time taken to review the patch.  I
apologize in advance for the rather long response...

I have been thinking about the points you made previously,
and have given the problem space some more thought.  I agree
that as it stands this is a very niche solution, and it would
be good to think about the broader picture and how things
might be designed differently to make the "feature" usable
more easily and to a broader group.

Taking a step back, the overall goal is to allow user space
to do stuff while the kernel is still initializing statically
linked drivers, so the device's primary function can be ready
as soon as possible (and not wait for secondarily-needed
functionality to initialize). For things that are able to be
made into a module (and for situations where the kernel module
loading is turned on), this feature should not be needed in
its current form.  In that case, user space already has control
over module load ordering and timing.

The way the feature is expressed in the current code is that a
set of drivers are marked for deferred initialization (I'll refer
to this as issue 0).  Then, at boot: 1) most drivers are initialized
normally, 2) user space is started, and then 3) user space indicates
to the kernel that the deferred drivers should be initialized.

This is very coarse, allowing only two categories of drivers: (ignoring
other boot phases for the moment) - regular drivers and deferred drivers.
It also requires source code changes to mark the drivers to be deferred.
Finally, it requires an explicit notification from user-space to complete
the process.  All of these attributes are undesirable.

There may also be an opportunity here to work out more granular driver
load ordering, which would benefit other systems (especially those that
are hitting the EPROBE_DEFER issue).

As it stands now, the ordering of the initcalls within a particular level
is pretty much arbitrary (determined by link order, usually without oversight
by the developer).  Just FYI, here are some numbers culled from a recent
kernel:

initcall macro          number of instances in kernel source
--------------          ------------------------------------
early_init              446
core_init               614
postcore_init           150
arch_init               751
subsys_init             573
fs_init                 1372
device_init             1211
late_init               440


I'm going to rattle off a few ideas - I'm not sure which ones might
stick,  I just want to bounce these around and see what people think.
Note that I didn't think of most of these, but I'm just repeating ones
that have been stated, and adding a few thoughts of my own.

First, if the ordering of initialization is not the default
provided by the kernel, it needs to be recorded somewhere.  A developer
needs to express it (or a tool needs to figure it out), but if it is
going to be reflected in the final kernel behaviour (or image), the
kernel needs it at boot time (if not compile time).  The current
initcall system hardcodes a "level" for each driver initialization
routine in the source code itself, by putting it in the macro
name for each init routine.  There can
only be one such order expressed in the code itself.

For developers who wish to express another order (or priority), a
new mechanism will need to be used.  If possible, I strongly prefer
putting this into the KCONFIG system, as that is where other details
about kernel configuration are stored, and there are pre-existing tools
for dealing with the format.  I am hesitant to create a special language
or config format for this (unless it is much simpler than adding something
to Kconfig).  As Nicolas pointed out, Kconfig already has information
about dependencies in terms of not allowing a driver to be a module
if a dependent module is statically linked. Having the tool warn for
violations of that ordering would be valuable.

Possibly, we could use a fourth driver state ('D' for deferred), but
this still only allows very coarse ordering granularity.
How about if we added a numeric value for each driver, and had the macro
somehow use that number in ordering or deferring the driver initialization?
Say we supported order groups 0-9, with order 8 and 9 being deferred?

So we could add something like:
CONFIG_USB_EHCI_HCD_INITORDER=9

Here are some questions...
Do all driver initialization routines have a corresponding config
variable? Also, do we really want to manually add all these CONFIG
items?  Is there a way to allow expressing a config item like this,
automatically, without having to create each one in a Kconfig file?
Is the set of routines that we might want to defer small enough that
we could get by with just defining only a specific set of these
(rather than for all possible drivers and initcalls)?  
Can we get by with just listing exceptions to default ordering, or
is something more comprehensive required?

Another possibility is a binary post-processor, which reorders
the initcall tables in the kernel, after the compile has finished.
So, rather than relying on the compiler, there would be a separate
tool to modify the kernel binary to have the desired init ordering.
The initcall macro could be extended to provide input to this tool,
and the tool could read a separate configuration file indicating
the routines that should be reordered in the boot sequence.

Another idea would be to make the starting of user-space it's own
initialization routine, which was not necessarily started as the last thing
after all other statically linked driver initializations.  Then, it
could begin operation before other drivers were initialized. It's init
order could be controlled using the same mechanism as other initcalls.

Right now, user space starts as if it were a late_initcall, with an
INITORDER=9, but if this were configurable, that might solve a lot
of the problem.  A developer could push the order of user-space start
earlier into the initialization sequence, if they needed to.

If stricter ordering was required, such as making sure user-space
got cycles before other drivers, then the threads managing such
initializations would need to be prioritized.  Maybe user space
could elevate it's scheduling priority, or a configuration item
could indicate a high starting scheduling priority, so that user
space would be guaranteed to run before other (lower-priority)
init routines. This would allow lower-priority initializations
to proceed in piecemeal fashion (using up cycles whenever the
high-priority user-space was not busy).  The "trigger"
for allowing low-priority initializations to proceed could then be
something like the user-space thread lowering its scheduling priority
back to "normal".  This would use already-existing syscalls, and
would not require a /sys or /proc trigger mechanism.

I'm not sure if the problem drivers (USB and networking) are
interruptible during their init routines (especially on UP machines).
This would need to be tested, to see if they can start in the
background and not cause a big delay to the higher priority task.

Grant Likely suggested deferring the ordering decision in a
way that allowed it to be expressed at runtime rather than at
compile-time. That, I think, would require a more substantial
rework of the initcall system, probably requiring to make it
text-driven.  It does have the possibility of solving some
other driver init ordering problems that are now being
addressed with EPROBE_DEFER.  My guess is that making the initcall
system text-driven would increase the size of it to a degree
that it would make more sense just to turn on the loadable
module system.  But I'm open to ideas how this might be done
efficiently.  I don't see how this could be done in a binary
fashion, as I'm pretty sure Grant would intend for this
ordering information to live outside a particular binary
instance of the kernel (similar to device tree).

I think a lot of this is what Nicolas was getting at last week,
and I didn't understand the ideas he was putting forth. Since
this is a niche case, it may not be worth rewriting the
initcall system to handle it.  But I'm interested in whether
people think this is worth working on or not.  This patch *has*
been useful (and used), so there's clearly an unfulfilled need.
And maybe this discussion can result in a solution that is more
general and amenable to mainlining.

Thanks for listening.
 -- Tim
--
To unsubscribe from this list: send the line "unsubscribe linux-embedded" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to